Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ chrtbl(1M) — Reliant UNIX 5.44c4

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

ctype(3C)

setlocale(3C)

environ(5)

chrtbl(1M)                                                       chrtbl(1M)

NAME
     chrtbl - generate character classification and conversion tables

SYNOPSIS
     chrtbl [file]

DESCRIPTION
     The chrtbl command creates two tables containing information on char-
     acter classification, upper/lowercase conversion, character-set width,
     and numeric formatting. One table is an array of (257*2) + 7 bytes
     which is encoded to enable a table lookup to be used to determine the
     character classification of a character, convert a character [see
     ctype(3C)], and find the byte and screen width of a character in one
     of the supplementary code sets. The other table contains information
     about the format of non-monetary numeric quantities: the first byte
     specifies the decimal delimiter; the second byte specifies the
     thousands delimiter; and the remaining bytes comprise a null-
     terminated string indicating the grouping (each element of the string
     is taken as an integer that indicates the number of digits that
     comprise the current group in a formatted non-monetary numeric quan-
     tity).

     chrtbl reads the user-defined character classification and conversion
     information from file and creates three output files in the current
     directory. To construct file, use the file supplied in
     /usr/lib/locale/C/chrtblC as a starting point. You may add entries,
     but do not change the original values supplied with the system. For
     example, for other locales you may wish to add eight-bit entries to
     the ASCII definitions provided in this file.

     One output file, ctype.c (a C-language source file), contains a
     (257*2)+7-byte array generated from processing the information from
     file. You should review the content of ctype.c to verify that the
     array is set up as you had planned. (In addition, an application pro-
     gram could use ctype.c.) The first 257 bytes of the array in ctype.c
     are used for character classification. The characters used for ini-
     tializing these bytes of the array represent character classifications
     that are defined in /usr/include/ctype.h; for example, S means a
     character is a spacing character and U|L means the character is
     uppercase or lowercase. The second 257 bytes of the array are used for
     character conversion. These bytes of the array are initialized so that
     characters for which you do not provide conversion information will be
     converted to themselves. When you do provide conversion information,
     the first value of the pair is stored where the second one would be
     stored normally, and vice versa; for example, if you provide
     <0x41 0x61>, then 0x61 is stored where 0x41 would be stored normally,
     and 0x61 is stored where 0x41 would be stored normally. The last 7
     bytes are used for character width information for up to three supple-
     mentary code sets.






Page 1                       Reliant UNIX 5.44                Printed 11/98

chrtbl(1M)                                                       chrtbl(1M)

     The second output file (a data file) contains the same information,
     but is structured for efficient use by the character classification
     and conversion routines [see ctype(3C)]. The name of this output file
     is the value you assign to the keyword LCCTYPE read in from file.
     Before this file can be used by the character classification and
     conversion routines, it must be installed in the
     /usr/lib/locale/locale directory with the name LCCTYPE by someone who
     is superuser or a member of group bin. This file must be readable by
     user, group, and other; no other permissions should be set. To use the
     character classification and conversion tables in this file, set the
     LCCTYPE environment variable appropriately [see environ(5) or
     setlocale(3C)].

     The third output file (a data file) is created only if numeric format-
     ting information is specified in the input file. The name of this out-
     put file is the value you assign to the keyword LCNUMERIC read in
     from file. Before this file can be used, it must be installed in the
     /usr/lib/locale/locale directory with the name LCNUMERIC by a
     superuser or a member of group bin. This file must be readable by
     user, group, and other; no other permissions should be set. To use the
     numeric formatting information in this file, set the LCNUMERIC envi-
     ronment variable appropriately [see environ(5) or setlocale(3C)].

     The name of the locale where you install the files LCCTYPE and
     LCNUMERIC should correspond to the conventions defined in file. For
     example, if French conventions were defined, and the name for the
     French locale on your system is french, then you should install the
     files in /usr/lib/locale/french.

     If no input file is given, or if the argument "-" is encountered,
     chrtbl reads from standard input.

     The syntax of file allows the user to define the names of the data
     files created by chrtbl, the assignment of characters to character
     classifications, the relationship between uppercase and lowercase
     letters, byte and screen widths for up to three supplementary code
     sets, and three items of numeric formatting information: the decimal
     delimiter, the thousands delimiter and the grouping. The keywords
     recognized by chrtbl are:

     LCCTYPE          Name of the data file created by chrtbl to contain
                       character classification, conversion, and width
                       information.

     isupper           Character codes to be classified as uppercase
                       letters.

     islower           Character codes to be classified as lowercase
                       letters.

     isdigit           Character codes to be classified as numeric.



Page 2                       Reliant UNIX 5.44                Printed 11/98

chrtbl(1M)                                                       chrtbl(1M)

     isspace           Character codes to be classified as spacing (delim-
                       iter) characters.

     ispunct           Character codes to be classified as punctuation
                       characters.

     iscntrl           Character codes to be classified as control charac-
                       ters.

     isblank           Character code for the space character.

     isxdigit          Character codes to be classified as hexadecimal
                       digits.

     ul                Relationship between uppercase and lowercase charac-
                       ters.

     cswidth           Byte and screen width information (by default, each
                       is one character wide).

     LCNUMERIC        Name of the data file created by chrtbl to contain
                       numeric formatting information.

     decimalpoint     Decimal delimiter.

     thousandssep     Thousands delimiter.

     grouping          String in which each element is taken as an integer
                       that indicates the number of digits that comprise
                       the current group in a formatted non-monetary
                       numeric quantity.

     Any lines with the number sign (#) in the first column are treated as
     comment and are ignored. Blank lines are also ignored.

     Characters for isupper, islower, isdigit, isspace, ispunct, iscntrl,
     isblank, isxdigit, and ul can be represented as a hexadecimal or octal
     constant (for example, the letter a can be represented as 0x61 in hex-
     adecimal or 0141 in octal). Hexadecimal and octal constants may be
     separated by one or more space and/or tab characters.

     The dash character (-) may be used to indicate a range of consecutive
     numbers. Zero or more space characters may be used for separating the
     dash character from the numbers.

     The backslash character (\) is used for line continuation. Only a car-
     riage return is permitted after the backslash character.







Page 3                       Reliant UNIX 5.44                Printed 11/98

chrtbl(1M)                                                       chrtbl(1M)

     The relationship between uppercase and lowercase letters (ul) is
     expressed as ordered pairs of octal or hexadecimal constants:
     <uppercasecharacter lowercasecharacter>. These two constants may be
     separated by one or more space characters. Zero or more space charac-
     ters may be used for separating the angle brackets (< >) from the
     numbers.

     The following is the format of an input specification for cswidth:
     n1:s1,n2:s2,n3:s3 where,

          n1   byte width for supplementary code set 1, required

          s1   screen width for supplementary code set 1

          n2   byte width for supplementary code set 2

          s2   screen width for supplementary code set 2

          n3   byte width for supplementary code set 3

          s3   screen width for supplementary code set 3

     decimalpoint and thousandssep are specified by a single character
     that gives the delimiter. grouping is specified by a quoted string in
     which each member may be in octal or hex representation. For example,
     \3 or \x3 could be used to set the value of a member of the string to
     3.

EXAMPLE
     The following is an example of an input file used to create the USA-
     ENGLISH code set definition table in a file named usa and the non-
     monetary numeric formatting information in a file name num-usa.

          LCCTYPE  usa
          isupper   0x41 - 0x5a
          islower   0x61 - 0x7a
          isdigit   0x30 - 0x39
          isspace   0x20 0x9 - 0xd
          ispunct   0x21 - 0x2f   0x3a - 0x40   0x5b - 0x60   0x7b - 0x7e
          iscntrl   0x0 - 0x1f     0x7f
          isblank   0x20
          isxdigit  0x30 - 0x39   0x61 - 0x66   0x41 - 0x46
          ul       <0x41 0x61> <0x42 0x62> <0x43 0x63>  \
                   <0x44 0x64> <0x45 0x65> <0x46 0x66>  \
                   <0x47 0x67> <0x48 0x68> <0x49 0x69>  \
                   <0x4a 0x6a> <0x4b 0x6b> <0x4c 0x6c>  \
                   <0x4d 0x6d> <0x4e 0x6e> <0x4f 0x6f>  \
                   <0x50 0x70> <0x51 0x71> <0x52 0x72>  \
                   <0x53 0x73> <0x54 0x74> <0x55 0x75>  \
                   <0x56 0x76> <0x57 0x77> <0x58 0x78>  \
                   <0x59 0x79> <0x5a 0x7a>
          cswidth         1:1,0:0,0:0


Page 4                       Reliant UNIX 5.44                Printed 11/98

chrtbl(1M)                                                       chrtbl(1M)

          LCNUMERIC      numusa
          decimalpoint   .
          thousandssep   ,
          grouping        "\3"

LOCALE
     The LCMESSAGES environment variable governs the language in which
     message texts are displayed.

     If LCMESSAGES is undefined or is defined as the null string, it
     defaults to the value of LANG. If LANG is likewise undefined or null,
     the system acts as if it were not internationalized.

     If any of the locale variables has an invalid value, the system acts
     as if none of the variables was set.

     The LCALL environment variable governs the entire locale. LCALL
     takes precedence over all the other environment variables which affect
     internationalization.

DIAGNOSTICS
     The error messages produced by chrtbl are intended to be self-
     explanatory. They indicate errors in the command line or syntactic
     errors encountered within the input file.

WARNING
     Changing the files in /usr/lib/locale/C will cause the system to
     behave unpredictably.

FILES
     /usr/lib/locale/locale/LCCTYPE
          data files containing character classification, conversion, and
          character-set width information created by chrtbl

     /usr/lib/locale/locale/LCNUMERIC
          data files containing numeric formatting information created by
          chrtbl

     /usr/include/ctype.h
          header file containing information used by character classifica-
          tion and conversion routines

     /usr/lib/locale/C/chrtblC
          input file used to construct LCCTYPE and LCNUMERIC in the
          default locale.

SEE ALSO
     ctype(3C), setlocale(3C), environ(5).






Page 5                       Reliant UNIX 5.44                Printed 11/98

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026