ctab(1) — AIX PS/2 1.2.1



CTAB(1,C)                   AIX Commands Reference                    CTAB(1,C)



-------------------------------------------------------------------------------
ctab



PURPOSE

Produces a collating table.

SYNTAX


        +- -i ctab.in -+   +- -o ctab.out -+   +---------------+
ctab ---|              |---|               |---|               |--|
        +- -i infile --+   +- -o outfile --+   +- -c cputype --+


DESCRIPTION

The ctab command takes an input file (by default, named ctab.in and found in
the current directory) and produces a binary file (by default, named ctab.out)
containing a collating table.  These output files are stored in a conventional
directory.  Programs that need the current collating and case information use
the NLCTAB environment variable to access that information.

The following conventions are used to make it easier to set up a table file:

  o One line of information is present for each character explicitly named.

  o A line beginning with the word option serves to change one or more of the
    default conditions or metacharacters built into the ctab command.  An
    option line contains a set of name/value pairs, with each half of each pair
    delimited by tab or space characters.  The following is a list of
    recognized names:

    eclass     Turns the use of equivalence classes on or off globally.  The
               assigned value must be on (the default) or off.

    sep        Uses the assigned value as the field separator character.  The
               default value is : (colon).

    trans      Uses the assigned value of the "translate" indicator in subject
               character fields.  The default character is | (vertical bar).

    repeat     Uses the assigned value as the "same as last line" indicator in
               subject character field.  The default value is ^ (circumflex).

    comment    Uses the assigned value as the comment character.  The default
               value is the # character.

  o The order of the per-character input lines specifies the collating
    sequence.



Processed November 8, 1990         CTAB(1,C)                                  1



CTAB(1,C)                   AIX Commands Reference                    CTAB(1,C)




  o By default, fields on a line are separated by colons.  Tabs or spaces may
    surround fields or separators.  You can change the separator character with
    an option line.

  o Use an octal or hex escape sequence to name a non-printable character.  An
    octal escape sequence is preceded by a "\" (backslash) or "\0", and a hex
    escape sequence is preceded by a "\0x" or "\0X".  A backslash character
    that does not form part of a valid escape sequence serves to strip the
    following character, including a second backslash, of any special meaning
    it otherwise would have.  For example, to include the colon character in
    the collating sequence, use the following line:

      \::

    The input file format includes a comment convention, namely that the
    remainder of the line following a # character is ignored.  The comment
    character can be changed with an option line.

Note:  The input file must be pre-processed by the dd command prior to its use
       in the ctab command.

INPUT FILE SPECIFICATION

Use the following rules to build infile, entering field information for each
line:

  1. The first field on a line contains the subject character, a character to
    be inserted into the collating sequence at that point.

      o This subject character definition can include a translation mechanism:

          - Instead of a single character, this field may contain two or more
            characters that are to be collated as a single unit, or

          - The single subject character may be followed by a (|) vertical bar
            and a single- or multiple-character string.  The vertical bar
            indicates that the first character will be translated to the second
            string before being collated.

            For example, to treat an "e" (e acute) as equivalent to the
            character "e", use the following line:

              e|e

          - One restriction is placed on the translation mechanism:  the
            subject character cannot be contained in the translated string of
            characters.  For example, the following line is illegal:

              o|oe





Processed November 8, 1990         CTAB(1,C)                                  2



CTAB(1,C)                   AIX Commands Reference                    CTAB(1,C)



      o Any form of the first field may contain a trailing circumflex (^) to
        indicate that the current character is to collate to the same value as
        the preceding one.  However, a circumflex following a translation
        string is illegal because the subject character to be translated has no
        inherent collating value.

      o If the subject field contains a string of multiple characters (to
        collate as a unit), its first character must be declared elsewhere to
        establish the default collating sequence of that character.

      o The translate and collating no-change characters can be changed with
        option lines.

  2. The second and third fields specify whether a character is alphabetic and
    what its lowercase and uppercase equivalents are:

      o If a subject character is to be treated as a lowercase alphabetic, the
        second field on its line is its uppercase equivalent, and the third
        field must be l or L.

      o If a subject character is to be treated as a uppercase alphabetic, the
        second field on its line is its lowercase equivalent, and the third
        field must be u or U.

      o If a subject character is to be treated as a control character or a
        space character, the third field must be c, C, s, or S.

      o Each explicitly named character whose line contains a non-null second
        field is considered alphabetic (that is, matched by isalpha).
        Characters that do not have an uppercase or lowercase equivalent (that
        is, that have a null second field) but that you wish to be considered
        alphabetic should contain a third field that is l, L, u, or u.

  3. The fourth field on a line is used explicitly to specify the first
    character in the equivalence class of the subject character.  The members
    of one equivalence class must be consecutively listed in the input file.

      o There cannot be any gaps within a particular equivalence class.  For
        example, the following lines put the characters a, b, and c in the same
        equivalence class:

          a:A:l:a
          b:B:l:a
          c:C:l:a

      o As a convenience, if the fourth field is not specified, the group of
        consecutive characters with blank fourth fields, provided that they are
        all based on the same Roman alphabetic character, are placed in the
        same equivalence class.  To reiterate, only characters with the same
        base are placed into the same equivalence class by default.  If you
        wish to have many characters from different bases belong to one
        equivalence class, as in the example above, the first character of the



Processed November 8, 1990         CTAB(1,C)                                  3



CTAB(1,C)                   AIX Commands Reference                    CTAB(1,C)



        equivalence class has to be specified in the fourth field for every
        character specified.

      o It is illegal to specify an equivalence character that comes later in
        the collating sequence.  The fourth field can refer only to characters
        that have already been mentioned.

      o All international character support characters not based on Roman
        alphabetic characters by default are the sole members of their
        equivalence class.

Characters not named in the table file that have an ordinal value (that is, a
value as a char) below the ordinal value of the lowest-valued character named
are put into the collating sequence below the first character in the table
file.  All other characters not named in the table file are put into the
collating sequence above the last character in the table file.

The standard characters for decimal and hexadecimal digits are always marked as
digits (to be matched by isdigit and isxdigit).  All other printable characters
not marked as alphabetic are marked as punctuation.

FLAGS

-c cputype    Specifies the CPU type of the output file.  In clusters with both
              i386 and i370 machines, the output file is created as a hidden
              directory.  In clusters with one CPU type, the output is a file.

-i infile     Specifies the name of the input file (ctab.in, by default).

-o outfile    Specifies the name of the output file (ctab.out, by default).

FILES

/usr/lib/mbcs/*.ctab        Contains the various default and locale-dependent
                            collation table sources and binaries.  Collation
                            source files have the ctab extension.

RELATED INFORMATION

See the isalpha, isdigit, isxdigit, nls, and getenv subroutines in AIX
Operating System Technical Reference.

See "Introduction to International Character Support" in Managing the AIX
Operating System.











Processed November 8, 1990         CTAB(1,C)                                  4

Museum