CTAB(1,C) AIX Commands Reference CTAB(1,C)
-------------------------------------------------------------------------------
ctab
PURPOSE
Produces a collating table.
SYNTAX
+- -i ctab.in -+ +- -o ctab.out -+ +---------------+
ctab ---| |---| |---| |--|
+- -i infile --+ +- -o outfile --+ +- -c cputype --+
DESCRIPTION
The ctab command takes an input file (by default, named ctab.in and found in
the current directory) and produces a binary file (by default, named ctab.out)
containing a collating table. These output files are stored in a conventional
directory. Programs that need the current collating and case information use
the NLCTAB environment variable to access that information.
The following conventions are used to make it easier to set up a table file:
o One line of information is present for each character explicitly named.
o A line beginning with the word option serves to change one or more of the
default conditions or metacharacters built into the ctab command. An
option line contains a set of name/value pairs, with each half of each pair
delimited by tab or space characters. The following is a list of
recognized names:
eclass Turns the use of equivalence classes on or off globally. The
assigned value must be on (the default) or off.
sep Uses the assigned value as the field separator character. The
default value is : (colon).
trans Uses the assigned value of the "translate" indicator in subject
character fields. The default character is | (vertical bar).
repeat Uses the assigned value as the "same as last line" indicator in
subject character field. The default value is ^ (circumflex).
comment Uses the assigned value as the comment character. The default
value is the # character.
o The order of the per-character input lines specifies the collating
sequence.
Processed November 8, 1990 CTAB(1,C) 1
CTAB(1,C) AIX Commands Reference CTAB(1,C)
o By default, fields on a line are separated by colons. Tabs or spaces may
surround fields or separators. You can change the separator character with
an option line.
o Use an octal or hex escape sequence to name a non-printable character. An
octal escape sequence is preceded by a "\" (backslash) or "\0", and a hex
escape sequence is preceded by a "\0x" or "\0X". A backslash character
that does not form part of a valid escape sequence serves to strip the
following character, including a second backslash, of any special meaning
it otherwise would have. For example, to include the colon character in
the collating sequence, use the following line:
\::
The input file format includes a comment convention, namely that the
remainder of the line following a # character is ignored. The comment
character can be changed with an option line.
Note: The input file must be pre-processed by the dd command prior to its use
in the ctab command.
INPUT FILE SPECIFICATION
Use the following rules to build infile, entering field information for each
line:
1. The first field on a line contains the subject character, a character to
be inserted into the collating sequence at that point.
o This subject character definition can include a translation mechanism:
- Instead of a single character, this field may contain two or more
characters that are to be collated as a single unit, or
- The single subject character may be followed by a (|) vertical bar
and a single- or multiple-character string. The vertical bar
indicates that the first character will be translated to the second
string before being collated.
For example, to treat an "e" (e acute) as equivalent to the
character "e", use the following line:
e|e
- One restriction is placed on the translation mechanism: the
subject character cannot be contained in the translated string of
characters. For example, the following line is illegal:
o|oe
Processed November 8, 1990 CTAB(1,C) 2
CTAB(1,C) AIX Commands Reference CTAB(1,C)
o Any form of the first field may contain a trailing circumflex (^) to
indicate that the current character is to collate to the same value as
the preceding one. However, a circumflex following a translation
string is illegal because the subject character to be translated has no
inherent collating value.
o If the subject field contains a string of multiple characters (to
collate as a unit), its first character must be declared elsewhere to
establish the default collating sequence of that character.
o The translate and collating no-change characters can be changed with
option lines.
2. The second and third fields specify whether a character is alphabetic and
what its lowercase and uppercase equivalents are:
o If a subject character is to be treated as a lowercase alphabetic, the
second field on its line is its uppercase equivalent, and the third
field must be l or L.
o If a subject character is to be treated as a uppercase alphabetic, the
second field on its line is its lowercase equivalent, and the third
field must be u or U.
o If a subject character is to be treated as a control character or a
space character, the third field must be c, C, s, or S.
o Each explicitly named character whose line contains a non-null second
field is considered alphabetic (that is, matched by isalpha).
Characters that do not have an uppercase or lowercase equivalent (that
is, that have a null second field) but that you wish to be considered
alphabetic should contain a third field that is l, L, u, or u.
3. The fourth field on a line is used explicitly to specify the first
character in the equivalence class of the subject character. The members
of one equivalence class must be consecutively listed in the input file.
o There cannot be any gaps within a particular equivalence class. For
example, the following lines put the characters a, b, and c in the same
equivalence class:
a:A:l:a
b:B:l:a
c:C:l:a
o As a convenience, if the fourth field is not specified, the group of
consecutive characters with blank fourth fields, provided that they are
all based on the same Roman alphabetic character, are placed in the
same equivalence class. To reiterate, only characters with the same
base are placed into the same equivalence class by default. If you
wish to have many characters from different bases belong to one
equivalence class, as in the example above, the first character of the
Processed November 8, 1990 CTAB(1,C) 3
CTAB(1,C) AIX Commands Reference CTAB(1,C)
equivalence class has to be specified in the fourth field for every
character specified.
o It is illegal to specify an equivalence character that comes later in
the collating sequence. The fourth field can refer only to characters
that have already been mentioned.
o All international character support characters not based on Roman
alphabetic characters by default are the sole members of their
equivalence class.
Characters not named in the table file that have an ordinal value (that is, a
value as a char) below the ordinal value of the lowest-valued character named
are put into the collating sequence below the first character in the table
file. All other characters not named in the table file are put into the
collating sequence above the last character in the table file.
The standard characters for decimal and hexadecimal digits are always marked as
digits (to be matched by isdigit and isxdigit). All other printable characters
not marked as alphabetic are marked as punctuation.
FLAGS
-c cputype Specifies the CPU type of the output file. In clusters with both
i386 and i370 machines, the output file is created as a hidden
directory. In clusters with one CPU type, the output is a file.
-i infile Specifies the name of the input file (ctab.in, by default).
-o outfile Specifies the name of the output file (ctab.out, by default).
FILES
/usr/lib/mbcs/*.ctab Contains the various default and locale-dependent
collation table sources and binaries. Collation
source files have the ctab extension.
RELATED INFORMATION
See the isalpha, isdigit, isxdigit, nls, and getenv subroutines in AIX
Operating System Technical Reference.
See "Introduction to International Character Support" in Managing the AIX
Operating System.
Processed November 8, 1990 CTAB(1,C) 4