chrtbl(1M) — mips UMIPS RISC/os 5.01



CHRTBL(1M)          RISC/os Reference Manual           CHRTBL(1M)



NAME
     chrtbl - generate character classification and conversion
          tables

SYNOPSIS
     chrtbl [file]

DESCRIPTION
     The chrtbl command creates two tables containing information
     on character classification, upper/lower-case conversion,
     character-set width, and numeric formatting.  One table is
     an array of (257*2) + 7 bytes that is encoded so a table
     lookup can be used to determine the character classification
     of a character, convert a character [see ctype(3C)], and
     find the byte and screen width of a character in one of the
     supplementary code sets.  The other table contains informa-
     tion about the format of non-monetary numeric quantities:
     the first byte specifies the decimal delimiter; the second
     byte specifies the thousands delimiter; and the remaining
     bytes comprise a null terminated string indicating the
     grouping (each element of the string is taken as an integer
     that indicates the number of digits that comprise the
     current group in a formatted non-monetary numeric quantity).

     chrtbl reads the user-defined character classification and
     conversion information from file and creates three output
     files in the current directory.  To construct file, use the
     file supplied in /usr/lib/locale/C/chrtbl_C as a starting
     point.  You may add entries, but do not change the original
     values supplied with the system.  For example, for other
     locales you may wish to add eight-bit entries to the ASCII
     definitions provided in this file.

     One output file, ctype.c (a C-language source file), con-
     tains a (257*2)+7-byte array generated from processing the
     information from file.  You should review the content of
     ctype.c to verify that the array is set up as you had
     planned.  (In addition, an application program could use
     ctype.c.)  The first 257 bytes of the array in ctype.c are
     used for character classification.  The characters used for
     initializing these bytes of the array represent character
     classifications that are defined in /usr/include/ctype.h;
     for example, _L means a character is lower case and _S|_B
     means the character is both a spacing character and a blank.
     The second 257 bytes of the array are used for character
     conversion.  These bytes of the array are initialized so
     that characters for which you do not provide conversion
     information will be converted to themselves.  When you do
     provide conversion information, the first value of the pair
     is stored where the second one would be stored normally, and
     vice versa; for example, if you provide <0x41 0x61>, then
     0x61 is stored where 0x41 would be stored normally, and 0x61



                        Printed 11/19/92                   Page 1



CHRTBL(1M)          RISC/os Reference Manual           CHRTBL(1M)



     is stored where 0x41 would be stored normally.  The last 7
     bytes are used for character width information for up to
     three supplementary code sets.

     The second output file (a data file) contains the same
     information, but is structured for efficient use by the
     character classification and conversion routines (see
     ctype(3C)).  The name of this output file is the value you
     assign to the keyword LC_CTYPE read in from file.  Before
     this file can be used by the character classification and
     conversion routines, it must be installed in the
     /usr/lib/locale/locale directory with the name LC_CTYPE by
     someone who is super-user or a member of group bin.  This
     file must be readable by user, group, and other; no other
     permissions should be set.  To use the character classifica-
     tion and conversion tables in this file, set the LC_CTYPE
     environment variable appropriately (see environ(5) or
     setlocale(3C)).

     The third output file (a data file) is created only if
     numeric formatting information is specified in the input
     file.  The name of this output file is the value you assign
     to the keyword LC_NUMERIC read in from file.  Before this
     file can be used, it must be installed in the
     /usr/lib/locale/locale directory with the name LC_NUMERIC by
     someone who is super-user or a member of group bin.  This
     file must be readable by user, group, and other; no other
     permissions should be set.  To use the numeric formatting
     information in this file, set the LC_NUMERIC variable
     appropriately (see environ(5) or setlocale(3C)).

     The name of the locale where you install the files LC_CTYPE
     and LC_NUMERIC should correspond to the conventions defined
     in file.  For example, if French conventions were defined,
     and the name for the French locale on your system is french,
     then you should install the files in /usr/lib/locale/french.

     If no input file is given, or if the argument "-" is encoun-
     tered, chrtbl reads from standard input.

     The syntax of file allows the user to define the names of
     the data files created by chrtbl, the assignment of charac-
     ters to character classifications, the relationship between
     upper and lower-case letters, byte and screen widths for up
     to three supplementary code sets, and three items of numeric
     formatting information: the decimal delimiter, the thousands
     delimiter and the grouping.  The keywords recognized by
     chrtbl are:

     LC_CTYPE    name of the data file created by chrtbl to con-
                 tain character classification, conversion, and
                 width information



 Page 2                 Printed 11/19/92



CHRTBL(1M)          RISC/os Reference Manual           CHRTBL(1M)



     isupper     character codes to be classified as upper-case
                 letters

     islower     character codes to be classified as lower-case
                 letters

     isdigit     character codes to be classified as numeric

     isspace     character codes to be classified as spacing
                 (delimiter) characters

     ispunct     character codes to be classified as punctuation
                 characters

     iscntrl     character codes to be classified as control
                 characters

     isblank     character code for the blank (space) character

     isxdigit    character codes to be classified as hexadecimal
                 digits

     ul          relationship between upper- and lower-case char-
                 acters

     cswidth     byte and screen width information (by default,
                 each is one character wide)

     LC_NUMERIC  name of the data file created by chrtbl to con-
                 tain numeric formatting information

     decimal_point
                 decimal delimiter

     thousands_sep
                 thousands delimiter

     grouping    string in which each element is taken as an
                 integer that indicates the number of digits that
                 comprise the current group in a formatted non-
                 monetary numeric quantity.

     Any lines with the number sign (#) in the first column are
     treated as comments and are ignored.  Blank lines are also
     ignored.

     Characters for isupper, islower, isdigit, isspace, ispunct,
     iscntrl, isblank, isxdigit, and ul can be represented as a
     hexadecimal or octal constant (for example, the letter a can
     be represented as 0x61 in hexadecimal or 0141 in octal).
     Hexadecimal and octal constants may be separated by one or
     more space and/or tab characters.



                        Printed 11/19/92                   Page 3



CHRTBL(1M)          RISC/os Reference Manual           CHRTBL(1M)



     The dash character (-) may be used to indicate a range of
     consecutive numbers. Zero or more space characters may be
     used for separating the dash character from the numbers.

     The backslash character (\) is used for line continuation.
     Only a carriage return is permitted after the backslash
     character.

     The relationship between upper- and lower-case letters (ul)
     is expressed as ordered pairs of octal or hexadecimal con-
     stants:  <upper-case_character lower-case_character>.  These
     two constants may be separated by one or more space charac-
     ters.  Zero or more space characters may be used for
     separating the angle brackets (< >) from the numbers.

     The following is the format of an input specification for
     cswidth:

          n1:s1,n2:s2,n3:s3

     where,
          n1   byte width for supplementary code set 1, required
          s1   screen width for supplementary code set 1
          n2   byte width for supplementary code set 2
          s2   screen width for supplementary code set 2
          n3   byte width for supplementary code set 3
          s3   screen width for supplementary code set 3

     decimal_point and thousands_sep are specified by a single
     character that gives the delimiter. grouping is specified by
     a quoted string in which each member may be in octal or hex
     representation. For example, \3 or \x3 could be used to set
     the value of a member of the string to 3.

EXAMPLE
     The following is an example of an input file used to create
     the USA-ENGLISH code set definition table in a file named
     usa and the non-monetary numeric formatting information in a
     file name num-usa.

          LC_CTYPE  usa
          isupper   0x41 - 0x5a
          islower   0x61 - 0x7a
          isdigit   0x30 - 0x39
          isspace   0x20 0x9 - 0xd
          ispunct   0x21 - 0x2f    0x3a - 0x40    \
                    0x5b - 0x60    0x7b - 0x7e
          iscntrl   0x0 - 0x1f     0x7f
          isblank   0x20
          isxdigit  0x30 - 0x39    0x61 - 0x66    \
                    0x41 - 0x46
          ul       <0x41 0x61> <0x42 0x62> <0x43 0x63>  \



 Page 4                 Printed 11/19/92



CHRTBL(1M)          RISC/os Reference Manual           CHRTBL(1M)



                   <0x44 0x64> <0x45 0x65> <0x46 0x66>  \
                   <0x47 0x67> <0x48 0x68> <0x49 0x69>  \
                   <0x4a 0x6a> <0x4b 0x6b> <0x4c 0x6c>  \
                   <0x4d 0x6d> <0x4e 0x6e> <0x4f 0x6f>  \
                   <0x50 0x70> <0x51 0x71> <0x52 0x72>  \
                   <0x53 0x73> <0x54 0x74> <0x55 0x75>  \
                   <0x56 0x76> <0x57 0x77> <0x58 0x78>  \
                   <0x59 0x79> <0x5a 0x7a>
          cswidth        1:1,0:0,0:0
          LC_NUMERIC     num_usa
          decimal_point       .
          thousands_sep       ,
          grouping            "\3"

FILES
     /usr/lib/locale/locale/LC_CTYPE
                     data files containing character classifica-
                     tion, conversion, and character-set width
                     information created by chrtbl
     /usr/lib/locale/locale/LC_NUMERIC
                     data files containing numeric formatting
                     information created by chrtbl
     /usr/include/ctype.h
                     header file containing information used by
                     character classification and conversion rou-
                     tines
     /usr/lib/locale/C/chrtbl_C
                     input file used to construct LC_CTYPE and
                     LC_NUMERIC in the default locale.

SEE ALSO
     environ(5).
     ctype(3C), setlocale(3C) in the Programmer's Reference
     Manual.

DIAGNOSTICS
     The error messages produced by chrtbl are intended to be
     self-explanatory. They indicate errors in the command line
     or syntactic errors encountered within the input file.

WARNING
     Changing the files in /usr/lib/locale/C will cause the sys-
     tem to behave unpredictably.












                        Printed 11/19/92                   Page 5

Museum

Related Articles