ctab
PURPOSE
Produces a collating table.
SYNOPSIS
ctab [ -i infile ] [ -o outfile ]
DESCRIPTION
The ctab command takes an input file (by default a file
named ctab.in found in the current directory) and
produces a binary file (by default named ctab.out) con-
taining a collating table. These output files are stored
in a conventional directory. Programs that need the
current collating and case information use the NLCTAB
environment variable to access that information.
The following conventions are used to make it easier to
set up a table file:
o One line of information is present for each character
explicitly named.
o A line beginning with the word option serves to
change one or more of the default conditions or
metacharacters built into ctab. An option line con-
tains a set of name/value pairs, with each half of
each pair delimited by tab or space characters. The
following is a list of recognized names:
eclass Turns the use of equivalence classes on or
off globally. The assigned value must be on
(the default) or off.
sep Uses the assigned value as the field sepa-
rator character. The default value is :
(colon).
trans Uses the assigned value of the "translate"
indicator in subject character fields. The
default character is | (vertical bar).
repeat Uses the assigned value as the "same as last
line" indicator in subject character field.
The default value is ^ (circumflex).
comment Uses the assigned value as the comment char-
acter. The default value is the # char-
acter.
o The order of the per-character input lines specifies
the collating sequence.
o By default, fields on a line are separated by colons.
Tabs or spaces may surround fields or separators.
You can change the separator character with an option
line.
o Use an octal escape sequence in the ASCII range to
name a nonprintable character. A backslash character
that does not form part of a valid escape sequence
serves to strip the following character, including a
second backslash, of any special meaning it otherwise
would have. For example, to include the colon char-
acter in the collating sequence, use the following
line:
\::
The input file format includes a comment convention,
namely that the remainder of the line following a #
character is ignored. The comment character can be
changed with an option line.
INPUT FILE SPECIFICATION
Use the following rules to build infile, entering field
information for each line:
1. The first field on a line contains the subject char-
acter, a character to be inserted into the collating
sequence at that point.
o This subject character definition can include a
translation mechanism:
- Instead of a single character, this field may
contain two or more characters that are to be
collated as a single unit, or
- The single subject character may be followed
by a vertical bar (|) and a single- or
multiple-character string. The vertical bar
indicates that the first character will be
translated to the second string before being
collated.
For example, to treat an "e<" (e acute) as
equivalent to the character "e," use the fol-
lowing line:
é|oe
- One restriction is placed on the translation
mechanism: the subject character cannot be
contained in the translated string of charac-
ters. For example, the following line is
illegal:
o|oe
o Any form of the first field may contain a
trailing circumflex (^) to indicated that the
current character is to collate to the same value
as the preceding one. However, a circumflex fol-
lowing a translation string is illegal because
the subject character to be translated has no
inherent collating value.
o If the subject field contains a string of mul-
tiple characters (to collate as a unit), its
first character must be declared elsewhere to
establish the default collating sequence of that
character.
o The translate and collating no-change characters
can be changed with option lines.
2. The second and third fields specify whether or not a
character is alphabetic and what its lower- and
uppercase equivalents are:
o If a subject character is to be treated as a low-
ercase alphabetic, the second field on its line
is its uppercase equivalent, and the third field
must be l or L.
o If a subject character is to be treated as a
uppercase alphabetic, the second field on its
line is its lowercase equivalent, and the third
field must be u or U.
o If a subject character is to be treated as a
control character or a space character, the third
field must be c, C, s, or S.
o Each character explicitly named whose line con-
tains a nonnull second field will be considered
alphabetic (that is, matched by NCisalpha).
Characters that do not have an uppercase or low-
ercase equivalent (that is, that have a null
second field) but that you wish to be considered
alphabetic should simply contain a third field
that is l, L, u, or u.
3. The fourth field on a line is used explicitly to
specify the first character in the equivalence class
of the subject character. The members of one equiv-
alence class must be consecutively listed in the
input file.
o There cannot be any gaps within a particular
equivalence class. For example, the following
lines will put the characters a, b, and c in the
same equivalence class:
a:A:l:a
b:B:l:a
c:C:l:a
o As a convenience, if the fourth field is not
specified, then the group of consecutive charac-
ters with blank fourth fields, provided that they
are all based on the same Roman alphabetic char-
acter, will be placed in the same equivalence
class. To reiterate, only characters with the
same base will be placed into the same equiv-
alence class by default. If you wish to have
many characters from different bases belong to
one equivalence class, as in the example above,
the first character of the equivalence class has
to be specified in the fourth field for every
character specified.
o It is illegal to specify an equivalence character
that comes later in the collating sequence. The
fourth field can refer only to characters that
have already been mentioned.
o All international character support characters
not based on Roman alphabetic characters by
default are the sole members of their equivalence
class.
Characters not named in the table file that have an
ordinal value (that is, a value as an NLchar) below the
ordinal value of the lowest-valued character named are
put into the collating sequence below the first character
in the table file. All other characters not named in the
table file are put into the collating sequence above the
last character in the table file.
The standard characters for decimal and hexadecimal
digits are always marked as digits (to be matched by
NCisdigit and NCisxdigit). All other printable charac-
ters not marked as alphabetic are marked as punctuation.
FLAGS
-i infile Specifies the name of the input file
(ctab.in by default).
-o outfile Specifies the name of the output file
(ctab.out by default).
FILES
/usr/lib/nls/ascii.in Input file listing the ASCII
range of characters.
/usr/lib/nls/iso.in Input file listing the ISO Col-
lating Sequence
RELATED INFORMATION
The NCisalpha, NCisdigit, NCisxdigit, nls, and NLgetenv
subroutines in AIX Operating System Technical Reference.
The "Overview of International Character Support" in IBM
RT PC Managing the AIX Operating System.