chrtbl(1M) SYSTEM ADMINISTRATION COMMANDS chrtbl(1M)
NAME
chrtbl - generate character classification and conversion
tables
SYNOPSIS
chrtbl [file]
DESCRIPTION
The chrtbl command creates two tables containing information
on character classification, upper/lower-case conversion,
character-set width, and numeric editing. One table is an
array of (257*2) + 7 bytes that is encoded so a table lookup
can be used to determine the character classification of a
character, convert a character (see ctype(3C)), and find the
byte and screen width of a character in one of the supple-
mentary code sets. The other table is 2 bytes long: the
first byte specifies the decimal delimiter; the second byte
specifies the thousands delimiter.
chrtbl reads the user-defined character classification and
conversion information from file and creates three output
files in the current directory. To construct file, use the
file supplied in /usr/lib/locale/C/chrtblC as a starting
point. You may add entries, but do not change the original
values supplied with the system. For example, for other
locales you may wish to add eight-bit entries to the ASCII
definitions provided in this file. One output file, ctype.c
(a C-language source file), contains a (257*2)+7-byte array
generated from processing the information from file. You
should review the content of ctype.c to verify that the
array is set up as you had planned. (In addition, an appli-
cation program could use ctype.c.) The first 257 bytes of
the array in ctype.c are used for character classification.
The characters used for initializing these bytes of the
array represent character classifications that are defined
in /usr/include/ctype.h; for example, L means a character
is lower case and S|B means the character is both a spac-
ing character and a blank. The second 257 bytes of the
array are used for character conversion. These bytes of the
array are initialized so that characters for which you do
not provide conversion information will be converted to
themselves. When you do provide conversion information, the
first value of the pair is stored where the second one would
be stored normally, and vice versa; for example, if you pro-
vide <0x41 0x61>, then 0x61 is stored where 0x41 would be
stored normally, and 0x61 is stored where 0x41 would be
stored normally. The last 7 bytes are used for character
width information for up to three supplementary code sets.
The second output file (a data file) contains the same
information, but is structured for efficient use by the
character classification and conversion routines (see
1
chrtbl(1M) SYSTEM ADMINISTRATION COMMANDS chrtbl(1M)
ctype(3C)). The name of this output file is the value you
assign to the keyword LCCTYPE read in from file. Before
this file can be used by the character classification and
conversion routines, it must be installed in the
/usr/lib/locale/locale directory with the name LCCTYPE by
someone who is super-user or a member of group bin. This
file must be readable by user, group, and other; no other
permissions should be set. To use the character classifica-
tion and conversion tables in this file, set the LCCTYPE
environment variable appropriately (see environ(5) or
setlocale(3C)).
The third output file (a data file) is created only if
numeric editing information is specified in the input file.
The name of this output file is the value you assign to the
keyword LCNUMERIC read in from file. Before this file can
be used, it must be installed in the /usr/lib/locale/locale
directory with the name LCNUMERIC by someone who is super-
user or a member of group bin. This file must be readable
by user, group, and other; no other permissions should be
set. To use the numeric editing information in this file,
set the LCNUMERIC environment variable appropriately (see
environ(5) or setlocale(3C)).
The name of the locale where you install the files LCCTYPE
and LCNUMERIC should correspond to the conventions defined
in file. For example, if French conventions were defined,
and the name for the French locale on your system is french,
then you should install the files in /usr/lib/locale/french.
If no input file is given, or if the argument "-" is encoun-
tered, chrtbl reads from standard input.
The syntax of file allows the user to define the names of
the data files created by chrtbl, the assignment of charac-
ters to character classifications, the relationship between
upper and lower-case letters, byte and screen widths for up
to three supplementary code sets, and two items of numeric
editing information: the decimal delimiter and the thousands
delimiter. The keywords recognized by chrtbl are:
LCCTYPE name of the data file created by
chrtbl to contain character classifi-
cation, conversion, and width informa-
tion
isupper character codes to be classified as
upper-case letters
islower character codes to be classified as
lower-case letters
2
chrtbl(1M) SYSTEM ADMINISTRATION COMMANDS chrtbl(1M)
isdigit character codes to be classified as
numeric
isspace character codes to be classified as
spacing (delimiter) characters
ispunct character codes to be classified as
punctuation characters
iscntrl character codes to be classified as
control characters
isblank character code for the blank (space)
character
isxdigit character codes to be classified as
hexadecimal digits
ul relationship between upper- and
lower-case characters
cswidth byte and screen width information (by
default, each is one character wide)
LCNUMERIC name of the data file created by
chrtbl to contain numeric editing
information
decimalpoint decimal delimiter
thousandssep thousands delimiter
Any lines with the number sign (#) in the first column are
treated as comments and are ignored. Blank lines are also
ignored.
Characters for isupper, islower, isdigit, isspace, ispunct,
iscntrl, isblank, isxdigit, and ul can be represented as a
hexadecimal or octal constant (for example, the letter a can
be represented as 0x61 in hexadecimal or 0141 in octal).
Hexadecimal and octal constants may be separated by one or
more space and/or tab characters.
The dash character (-) may be used to indicate a range of
consecutive numbers. Zero or more space characters may be
used for separating the dash character from the numbers.
The backslash character (\) is used for line continuation.
Only a carriage return is permitted after the backslash
character.
3
chrtbl(1M) SYSTEM ADMINISTRATION COMMANDS chrtbl(1M)
The relationship between upper- and lower-case letters (ul)
is expressed as ordered pairs of octal or hexadecimal con-
stants: <upper-case_character lower-case_character>. These
two constants may be separated by one or more space charac-
ters. Zero or more space characters may be used for
separating the angle brackets (< >) from the numbers.
The following is the format of an input specification for
cswidth:
n1:s1,n2:s2,n3:s3
where,
n1 byte width for supplementary code set 1, required
s1 screen width for supplementary code set 1
n2 byte width for supplementary code set 2
s2 screen width for supplementary code set 2
n3 byte width for supplementary code set 3
s3 screen width for supplementary code set 3
EXAMPLE
The following is an example of an input file used to create
the ASCII code set definition table in a file named ascii.
LC_CTYPE ascii
isupper 0x41 - 0x5a
islower 0x61 - 0x7a
isdigit 0x30 - 0x39
isspace 0x20 0x9 - 0xd
ispunct 0x21 - 0x2f 0x3a - 0x40 \
0x5b - 0x60 0x7b - 0x7e
iscntrl 0x0 - 0x1f 0x7f
isblank 0x20
isxdigit 0x30 - 0x39 0x61 - 0x66 \
0x41 - 0x46
ul <0x41 0x61> <0x42 0x62> <0x43 0x63> \
<0x44 0x64> <0x45 0x65> <0x46 0x66> \
<0x47 0x67> <0x48 0x68> <0x49 0x69> \
<0x4a 0x6a> <0x4b 0x6b> <0x4c 0x6c> \
<0x4d 0x6d> <0x4e 0x6e> <0x4f 0x6f> \
<0x50 0x70> <0x51 0x71> <0x52 0x72> \
<0x53 0x73> <0x54 0x74> <0x55 0x75> \
<0x56 0x76> <0x57 0x77> <0x58 0x78> \
<0x59 0x79> <0x5a 0x7a>
cswidth 1:1,0:0,0:0
LC_NUMERIC num_ascii
decimal_point .
thousands_sep ,
FILES
/usr/lib/locale/locale/LCCTYPE
data files containing character classifica-
tion, conversion, and character-set width
information created by chrtbl
/usr/lib/locale/locale/LCNUMERIC
4
chrtbl(1M) SYSTEM ADMINISTRATION COMMANDS chrtbl(1M)
data files containing numeric editing infor-
mation created by chrtbl
/usr/include/ctype.h
header file containing information used by
character classification and conversion rou-
tines
/usr/lib/locale/C/chrtblC
input file used to construct LCCTYPE and
LCNUMERIC in the default locale.
SEE ALSO
environ(5).
ctype(3C), setlocale(3C) in the Programmer's Reference
Manual.
DIAGNOSTICS
The error messages produced by chrtbl are intended to be
self-explanatory. They indicate errors in the command line
or syntactic errors encountered within the input file.
WARNING
Changing the files in /usr/lib/locale/C will cause the sys-
tem to behave unpredictably.
5