Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ wchrtbl(1M) — UnixWare 2.01

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

ctype(3C)

environ(5)

setlocale(3C)

wctype(3C)






       wchrtbl(1M)                                              wchrtbl(1M)


       NAME
             wchrtbl - generate tables for ASCII and supplementary code
             sets

       SYNOPSIS
             wchrtbl [file]

       DESCRIPTION
             wchrtbl creates tables containing information on character
             classification, character conversion, character set width, and
             numeric editing.  The first table is a byte-sized array
             encoded such that a table lookup can be used to determine the
             character classification of a character, convert a character
             [see ctype(3C) and wctype(3C)], and find the byte and screen
             width of a character in one of the supplementary code sets.
             The size of the array is (257*2) + 7 bytes: 257 bytes are
             required for the 8-bit code set character classification
             table, 257 bytes for the upper- to lowercase and lower- to
             uppercase conversion table, and 7 bytes for character set
             width information.  The second table is 2 bytes long and is
             encoded such that the first byte is used to specify the
             decimal delimiter and the second byte the thousand delimiter.
             If supplementary code sets are specified, additional variable
             sized tables are generated for multibyte character
             classification and conversion.

             wchrtbl reads the user-defined character classification and
             conversion information from file and creates three output
             files in the current directory.  One output file, wctype.c (a
             C language source file), contains the variable sized array
             generated from processing the information from file.  You
             should review the content of wctype.c to verify that the array
             is set up as you had planned.  The first 257 bytes of the
             array in wctype.c are used for character classification for
             single byte characters.  The characters used for initializing
             these bytes of the array represent character classifications
             that are defined in ctype.h; for example, _L means a character
             is lower case and _S|_B means the character is both a spacing
             character and a blank.  The second 257 bytes of the array are
             used for character conversion.  These bytes of the array are
             initialized so that characters for which you do not provide
             conversion information will be converted to themselves.  When
             you do provide conversion information, the first value of the
             pair is stored where the second one would be stored normally,
             and vice versa.  For example, if you provide <0x41 0x61>, then
             0x61 is stored where 0x41 would be stored normally, and 0x61


                           Copyright 1994 Novell, Inc.               Page 1













      wchrtbl(1M)                                              wchrtbl(1M)


            is stored where 0x41 would be stored normally.  The last 7
            bytes are used for character width information.  Up to three
            supplementary code sets can be specified.

            For supplementary code sets, there are three sets of tables.
            The first set is three pointer arrays which point to
            supplementary code set information tables.  If the
            corresponding supplementary code set information is not
            specified, the contents of the pointers are zeros.  The second
            one is a set of three supplementary code set information
            tables.  Each table contains minimum and maximum code values
            to be classified and converted, and also contains pointers to
            character classification and conversion tables.  If there is
            no corresponding table, the contents of the pointers are
            zeros.  The last one is a set of character classification and
            conversion tables which contain the same information as the
            single byte table except that the codes are represented as
            process codes and the table size is variable.  The characters
            used for initializing values of the character classification
            table represent character classifications that are defined in
            ctype.h and wctype.h.  _E1 through _E8 are for international
            use and _E9 through _E24 are for language-dependent use.

            The second output file (a data file) contains the same
            information, but is structured for efficient use by the
            character classification and conversion routines [see
            ctype(3C) and wctype(3C)].  The name of this output file is
            the value of the character classification LC_CTYPE read in
            from file.  This output file must be copied to the
            /usr/lib/locale/locale/LC_CTYPE file by someone who is super-
            user or a member of group bin.  This file must be readable by
            user, group, and other; no other permissions should be set.
            To use the character classification and conversion tables on
            this file, set the LC_CTYPE category of setlocale [see
            setlocale(3C)] appropriately.

            The third output file (a data file) is created only if numeric
            editing information is specified in the input file.  The name
            of the file is the value of the character classification
            LC_NUMERIC read from the file.  This output file must be
            copied to the /usr/lib/locale/locale/LC_NUMERIC file by
            someone who is super-user or a member of group bin.  This file
            must be readable by user, group, and other; no other
            permissions should be set.  To use the numeric editing
            information on this file, set the LC_NUMERIC category of
            setlocale appropriately.


                          Copyright 1994 Novell, Inc.               Page 2













       wchrtbl(1M)                                              wchrtbl(1M)


             If no input file is given, or if the argument - is
             encountered, wchrtbl reads from standard input.

             The syntax of file allows the user to define the name of the
             data file created by wchrtbl, the assignment of characters to
             character classifications, the relationship between conversion
             letters, and byte and screen widths for up to three
             supplementary code sets.  The keywords recognized by wchrtbl
             are:

             LC_CTYPE        name of the first data file to be created by
                             wchrtbl

             isupper         character codes to be classified as uppercase
                             letters

             islower         character codes to be classified as lowercase
                             letters

             isdigit         character codes to be classified as numeric

             isspace         character codes to be classified as spacing
                             (delimiter) characters

             ispunct         character codes to be classified as
                             punctuation characters

             iscntrl         character codes to be classified as control
                             characters

             isblank         character code for the space character

             isxdigit        character codes to be classified as
                             hexadecimal digits

             ul              relationship between conversion characters

             cswidth         byte and screen width information

             LC_NUMERIC      name of the second data file created by
                             wchrtbl

             decimal_point   decimal delimiters





                           Copyright 1994 Novell, Inc.               Page 3













      wchrtbl(1M)                                              wchrtbl(1M)


            thousands_sep   thousands delimiters

            LC_CTYPE1       specify that functions for specification of
                            supplementary code set 1 follows

            LC_CTYPE2       specify that functions for specification of
                            supplementary code set 2 follows

            LC_CTYPE3       specify that functions for specification of
                            supplementary code set 3 follows

            isphonogram(iswchar1)
                            character codes to be classified as phonograms
                            in supplementary code sets

            isideogram(iswchar2)
                            character codes to be classified as ideograms
                            in supplementary code sets

            isenglish(iswchar3)
                            character codes to be classified as English
                            letters in supplementary code sets

            isnumber(iswchar4)
                            character codes to be classified as numeric in
                            supplementary code sets

            isspecial(iswchar5)
                            character codes to be classified as special
                            letters in supplementary code sets

            iswchar6        character codes to be classified as other
                            printable letters in supplementary code sets

            iswchar7 - iswchar8
                            reserved for international use

            iswchar9 - iswchar24
                            character codes to be classified as language-
                            dependent letters/characters

            grouping^       string in which each element is taken as an
                            integer that indicates the number of digits
                            that comprise the current group in a formatted
                            non-monetary numeric quantity.



                          Copyright 1994 Novell, Inc.               Page 4













       wchrtbl(1M)                                              wchrtbl(1M)


             The keywords iswchar1 through iswchar24 correspond to bit
             names _E1 through _E24 defined in wctype.h

             Any lines with the number sign (#) in the first column are
             treated as comments and are ignored.  Blank lines are also
             ignored.

             Characters for isupper, islower, isdigit, isspace, ispunct,
             iscntl, isblank, isxdigit, ul, isphonogram, isideogram,
             isenglish, isnumber, isspecial, and iswchar1-iswchar24 can be
             represented as hexadecimal or octal constants (for example,
             the letter a can be represented as 0x61 in hexadecimal or 0141
             in octal) and must be up to two byte process codes.
             Hexadecimal and octal constants may be separated by one or
             more space and tab characters.

             The following is the format of an input specification for
             cswidth (byte widths for supplementary code sets 2 and 3 are
             exclusive of the single shift characters):
                   cswidth n1[[:s1][,n2[:s2][,n3[:s3]]]]

             where,
                   n1    byte width for supplementary code set 1
                   s1    screen width for supplementary code set 1
                   n2    byte width for supplementary code set 2
                   s2    screen width for supplementary code set 2
                   n3    byte width for supplementary code set 3
                   s3    screen width for supplementary code set 3

             The dash character (-) may be used to indicate a range of
             consecutive numbers (inclusive of the characters delimiting
             the range).  Zero or more space characters may be used for
             separating the dash character from the numbers.

             The backslash character (\) is used for line continuation.
             Only a carriage return is permitted after the backslash
             character.

             The relationship between conversion letters (ul) is expressed
             as ordered pairs of octal or hexadecimal constants:
             <converting-character converted-character>.  These two
             constants must be up to two byte process codes and may be
             separated by one or more space characters.  Zero or more space
             characters may be used for separating the angle brackets (< >)




                           Copyright 1994 Novell, Inc.               Page 5













      wchrtbl(1M)                                              wchrtbl(1M)


            from the numbers.

      EXAMPLES
            The following is an example of an input file used to create
            the JAPAN code set definition table on a file named LC_CTYPE
            and LC_NUMERIC.
            #
            # locale JAPAN
            #
            LC_CTYPE    LC_CTYPE
            #
            # specification for single byte characters
            #
            isupper       0x41 - 0x5a
            islower       0x61 - 0x7a
            isdigit       0x30 - 0x39
            isspace       0x20   0x9 - 0xd
            ispunct       0x21 - 0x2f     0x3a - 0x40  \
                          0x5b - 0x60     0x7b - 0x7e
            iscntrl       0x0 - 0x1f      0x7f - 0x9f
            isblank       0x20
            isxdigit  0x30 - 0x39   0x61 - 0x66 0x41 - 0x46
            ul          <0x41 0x61> <0x42 0x62> <0x43 0x63> \
                        <0x44 0x64> <0x45 0x65> <0x46 0x66> \
                        <0x47 0x67> <0x48 0x68> <0x49 0x69> \
                        <0x4a 0x6a> <0x4b 0x6b> <0x4c 0x6c> \
                        <0x4d 0x6d> <0x4e 0x6e> <0x4f 0x6f> \
                        <0x50 0x70> <0x51 0x71> <0x52 0x72> \
                        <0x53 0x73> <0x54 0x74> <0x55 0x75> \
                        <0x56 0x76> <0x57 0x77> <0x58 0x78> \
                        <0x59 0x79> <0x5a 0x7a>
            cswidth           2:2,1:1,2:2
            LC_NUMERIC  LC_NUMERIC
            decimal_point     .
            thousands_sep
            #
            # specification for supplementary code set 1
            #
            LC_CTYPE1
            isupper           0xa3c1 - 0xa3da
            islower           0xa3e1 - 0xa3fa
            isdigit           0xa3b0 - 0xa3b9
            isspace           0xa1a1
            isphonogram 0xa4a1 - 0xa4f3 0xa5a1 - 0xa5f6
            isideogram  0xb0a1 - 0xb0fe 0xb1a1 - 0xb1fe 0xb2a1 - 0xb2fe \
                        0xb3a1 - 0xb3fe 0xb4a1 - 0xb4fe 0xb5a1 - 0xb5fe \


                          Copyright 1994 Novell, Inc.               Page 6













       wchrtbl(1M)                                              wchrtbl(1M)


                         0xb6a1 - 0xb6fe 0xb7a1 - 0xb7fe 0xb8a1 - 0xb8fe \
                         0xb9a1 - 0xb9fe 0xbaa1 - 0xbafe 0xbba1 - 0xbbfe \
                         0xbca1 - 0xbcfe 0xbda1 - 0xbdfe 0xbea1 - 0xbefe \
                         0xbfa1 - 0xbffe 0xc0a1 - 0xc0fe 0xc1a1 - 0xc1fe \
                         0xc2a1 - 0xc2fe 0xc3a1 - 0xc3fe 0xc4a1 - 0xc4fe \
                         0xc5a1 - 0xc5fe 0xc6a1 - 0xc6fe 0xc7a1 - 0xc7fe \
                         0xcca1 - 0xccfe 0xcda1 - 0xcdfe 0xcea1 - 0xcefe \
                         0xcfa1 - 0xcffe 0xd0a1 - 0xd0fe 0xd1a1 - 0xd1fe \
                         0xd2a1 - 0xd2fe 0xd3a1 - 0xd3fe 0xd4a1 - 0xd4fe \
                         0xd5a1 - 0xd5fe 0xd6a1 - 0xd6fe 0xd7a1 - 0xd7fe \
                         0xd8a1 - 0xd8fe 0xd9a1 - 0xd9fe 0xdaa1 - 0xdafe \
                         0xdba1 - 0xdbfe 0xdca1 - 0xdcfe 0xdda1 - 0xddfe \
                         0xdea1 - 0xdefe 0xdfa1 - 0xdffe 0xe0a1 - 0xe0fe \
                         0xe1a1 - 0xe1fe 0xe2a1 - 0xe2fe 0xe3a1 - 0xe3fe \
                         0xe4a1 - 0xe4fe 0xe5a1 - 0xe5fe 0xe6a1 - 0xe6fe \
                         0xe7a1 - 0xe7fe 0xe8a1 - 0xe8fe 0xe9a1 - 0xe9fe \
                         0xeaa1 - 0xeafe 0xeba1 - 0xebfe 0xeca1 - 0xecfe \
                         0xeda1 - 0xedfe 0xeea1 - 0xeefe 0xefa1 - 0xeffe \
                         0xf0a1 - 0xf0fe 0xf1a1 - 0xf1fe 0xf2a1 - 0xf2fe \
                         0xf3a1 - 0xf3fe 0xf4a1 - 0xf4fe 0xf5a1 - 0xf5fe \
                         0xf6a1 - 0xf6fe 0xf7a1 - 0xf7fe 0xf8a1 - 0xf8fe \
                         0xf9a1 - 0xf9fe 0xfaa1 - 0xfafe 0xfba1 - 0xfbfe \
                         0xfca1 - 0xfcfe 0xfda1 - 0xfdfe 0xfea1 - 0xfefe \
             isenglish   0xa3c1 - 0xa3da 0xa3e1 - 0xa3fa
             isnumber    0xa3b0 - 0xa3b9
             isspecial   0xa1a2 - 0xa1fe 0xa2a1 - 0xa2ae 0xa2ba - 0xa2c1 \
                         0xa2ca - 0xa2d0 0xa2dc - 0xa2ea 0xa2f2 - 0xa2f9 \
                         0xa2fe
             iswchar6    0xa6a1 - 0xa6b8 0xa6c1 - 0xa6d8 0xa7a1 - 0xa7c1 \
                         0xa7d1 - 0xa7f1
             #
             #           JIS X0208 whole code set
             #
             iswchar9    0xa1a1 - 0xa1fe 0xa2a1 - 0xa2fe 0xa3a1 - 0xa3fe \
                         0xa4a1 - 0xa4fe 0xa5a1 - 0xa5fe 0xa6a1 - 0xa6fe \
                         0xa7a1 - 0xa7fe 0xa8a1 - 0xa8fe 0xa9a1 - 0xa9fe \
                         0xaaa1 - 0xaafe 0xaba1 - 0xabfe 0xaca1 - 0xacfe \
                         0xada1 - 0xadfe 0xaea1 - 0xaefe 0xafa1 - 0xaffe \
                         0xb0a1 - 0xb0fe 0xb1a1 - 0xb1fe 0xb2a1 - 0xb2fe \
                         0xb3a1 - 0xb3fe 0xb4a1 - 0xb4fe 0xb5a1 - 0xb5fe \
                         0xb6a1 - 0xb6fe 0xb7a1 - 0xb7fe 0xb8a1 - 0xb8fe \
                         0xb9a1 - 0xb9fe 0xbaa1 - 0xbafe 0xbba1 - 0xbbfe \
                         0xbca1 - 0xbcfe 0xbda1 - 0xbdfe 0xbea1 - 0xbefe \
                         0xbfa1 - 0xbffe 0xc0a1 - 0xc0fe 0xc1a1 - 0xc1fe \
                         0xc2a1 - 0xc2fe 0xc3a1 - 0xc3fe 0xc4a1 - 0xc4fe \
                         0xc5a1 - 0xc5fe 0xc6a1 - 0xc6fe 0xc7a1 - 0xc7fe \


                           Copyright 1994 Novell, Inc.               Page 7













      wchrtbl(1M)                                              wchrtbl(1M)


                        0xc8a1 - 0xc8fe 0xc9a1 - 0xc9fe 0xcaa1 - 0xcafe \
                        0xcba1 - 0xcbfe 0xcca1 - 0xccfe 0xcda1 - 0xcdfe \
                        0xcea1 - 0xcefe 0xcfa1 - 0xcffe 0xd0a1 - 0xd0fe \
                        0xd1a1 - 0xd1fe 0xd2a1 - 0xd2fe 0xd3a1 - 0xd3fe \
                        0xd4a1 - 0xd4fe 0xd5a1 - 0xd5fe 0xd6a1 - 0xd6fe \
                        0xd7a1 - 0xd7fe 0xd8a1 - 0xd8fe 0xd9a1 - 0xd9fe \
                        0xdaa1 - 0xdafe 0xdba1 - 0xdbfe 0xdca1 - 0xdcfe \
                        0xdda1 - 0xddfe 0xdea1 - 0xdefe 0xdfa1 - 0xdffe \
                        0xe0a1 - 0xe0fe 0xe1a1 - 0xe1fe 0xe2a1 - 0xe2fe \
                        0xe3a1 - 0xe3fe 0xe4a1 - 0xe4fe 0xe5a1 - 0xe5fe \
                        0xe6a1 - 0xe6fe 0xe7a1 - 0xe7fe 0xe8a1 - 0xe8fe \
                        0xe9a1 - 0xe9fe 0xeaa1 - 0xeafe 0xeba1 - 0xebfe \
                        0xeca1 - 0xecfe 0xeda1 - 0xedfe 0xeea1 - 0xeefe \
                        0xefa1 - 0xeffe 0xf0a1 - 0xf0fe 0xf1a1 - 0xf1fe \
                        0xf2a1 - 0xf2fe 0xf3a1 - 0xf3fe 0xf4a1 - 0xf4fe \
                        0xf5a1 - 0xf5fe 0xf6a1 - 0xf6fe 0xf7a1 - 0xf7fe \
                        0xf8a1 - 0xf8fe 0xf9a1 - 0xf9fe 0xfaa1 - 0xfafe \
                        0xfba1 - 0xfbfe 0xfca1 - 0xfcfe 0xfda1 - 0xfdfe \
                        0xfea1 - 0xfefe
            #
            #           JIS X0208 parentheses
            #
            iswchar10   0xa1c6 - 0xa1db
            #
            #           JIS X0208 hiragana
            #
            iswchar11   0xa4a1 - 0xa4f3
            #
            #           JIS X0208 katakana
            #
            iswchar12   0xa5a1 - 0xa5f6
            #
            #           JIS X0208 other characters
            #
            iswchar13   0xa6a1 - 0xa6b8 0xa6c1 - 0xa6d8 0xa7a1 - 0xa7c1 \
                        0xa7d1 - 0xa7f1 0xa8a1 - 0xa8bf
            #
            #           English letter translation table
            #
            ul          <0xa3c1 0xa3e1> <0xa3c2 0xa3e2> <0xa3c3 0xa3e3> \
                        <0xa3c4 0xa3e4> <0xa3c5 0xa3e5> <0xa3c6 0xa3e6> \
                        <0xa3c7 0xa3e7> <0xa3c8 0xa3e8> <0xa3c9 0xa3e9> \
                        <0xa3ca 0xa3ea> <0xa3cb 0xa3eb> <0xa3cc 0xa3ec> \
                        <0xa3cd 0xa3ed> <0xa3ce 0xa3ee> <0xa3cf 0xa3ef> \
                        <0xa3d0 0xa3f0> <0xa3d1 0xa3f1> <0xa3d2 0xa3f2> \
                        <0xa3d3 0xa3f3> <0xa3d4 0xa3f4> <0xa3d5 0xa3f5> \


                          Copyright 1994 Novell, Inc.               Page 8













       wchrtbl(1M)                                              wchrtbl(1M)


                         <0xa3d6 0xa3f6> <0xa3d7 0xa3f7> <0xa3d8 0xa3f8> \
                         <0xa3d9 0xa3f9> <0xa3da 0xa3fa> \
             #
             #           kana translation table
             #
                         <0xa4a1 0xa5a1> <0xa4a2 0xa5a2> <0xa4a3 0xa5a3> \
                         <0xa4a4 0xa5a4> <0xa4a5 0xa5a5> <0xa4a6 0xa5a6> \
                         <0xa4a7 0xa5a7> <0xa4a8 0xa5a8> <0xa4a9 0xa5a9> \
                         <0xa4aa 0xa5aa> <0xa4ab 0xa5ab> <0xa4ac 0xa5ac> \
                         <0xa4ad 0xa5ad> <0xa4ae 0xa5ae> <0xa4af 0xa5af> \
                         <0xa4b0 0xa5b0> <0xa4b1 0xa5b1> <0xa4b2 0xa5b2> \
                         <0xa4b3 0xa5b3> <0xa4b4 0xa5b4> <0xa4b5 0xa5b5> \
                         <0xa4b6 0xa5b6> <0xa4b7 0xa5b7> <0xa4b8 0xa5b8> \
                         <0xa4b9 0xa5b9> <0xa4ba 0xa5ba> <0xa4bb 0xa5bb> \
                         <0xa4bc 0xa5bc> <0xa4bd 0xa5bd> <0xa4be 0xa5be> \
                         <0xa4bf 0xa5bf> <0xa4c0 0xa5c0> <0xa4c1 0xa5c1> \
                         <0xa4c2 0xa5c2> <0xa4c3 0xa5c3> <0xa4c4 0xa5c4> \
                         <0xa4c5 0xa5c5> <0xa4c6 0xa5c6> <0xa4c7 0xa5c7> \
                         <0xa4c8 0xa5c8> <0xa4c9 0xa5c9> <0xa4ca 0xa5ca> \
                         <0xa4cb 0xa5cb> <0xa4cc 0xa5cc> <0xa4cd 0xa5cd> \
                         <0xa4ce 0xa5ce> <0xa4cf 0xa5cf> <0xa4d0 0xa5d0> \
                         <0xa4d1 0xa5d1> <0xa4d2 0xa5d2> <0xa4d3 0xa5d3> \
                         <0xa4d4 0xa5d4> <0xa4d5 0xa5d5> <0xa4d6 0xa5d6> \
                         <0xa4d7 0xa5d7> <0xa4d8 0xa5d8> <0xa4d9 0xa5d9> \
                         <0xa4da 0xa5da> <0xa4db 0xa5db> <0xa4dc 0xa5dc> \
                         <0xa4dd 0xa5dd> <0xa4de 0xa5de> <0xa4df 0xa5df> \
                         <0xa4e0 0xa5e0> <0xa4e1 0xa5e1> <0xa4e2 0xa5e2> \
                         <0xa4e3 0xa5e3> <0xa4e4 0xa5e4> <0xa4e5 0xa5e5> \
                         <0xa4e6 0xa5e6> <0xa4e7 0xa5e7> <0xa4e8 0xa5e8> \
                         <0xa4e9 0xa5e9> <0xa4ea 0xa5ea> <0xa4eb 0xa5eb> \
                         <0xa4ec 0xa5ec> <0xa4ed 0xa5ed> <0xa4ee 0xa5ee> \
                         <0xa4ef 0xa5ef> <0xa4f0 0xa5f0> <0xa4f1 0xa5f1> \
                         <0xa4f2 0xa5f2> <0xa4f3 0xa5f3>
             #
             # specification for supplementary code set 2
             #
             LC_CTYPE2
             iswchar6    0xa1 - 0xdf
             iswchar14   0xa1 - 0xdf

       FILES
             /usr/lib/locale/locale/LC_CTYPE
                             data files containing character classification
                             and conversion tables and character set width
                             information created by chrtbl or wchrtbl



                           Copyright 1994 Novell, Inc.               Page 9













      wchrtbl(1M)                                              wchrtbl(1M)


            /usr/lib/locale/locale/LC_NUMERIC
                            data files containing numeric editing
                            information
            /usr/include/ctype.h
                            header file containing information used by
                            character classification and conversion
                            routines for single byte characters.
            /usr/include/wctype.h
                            header file containing information used by
                            international character classification and
                            conversion routines for supplementary code
                            sets.
            /usr/include/xctype.h
                            header file containing information used by
                            language-dependent character classification
                            and conversion routines for supplementary code
                            sets.

      REFERENCES
            ctype(3C), environ(5), setlocale(3C), wctype(3C)




























                          Copyright 1994 Novell, Inc.              Page 10








Typewritten Software • bear@typewritten.org • Edmonds, WA 98026