ctab(1) — AIX/RT 2.2.1

ctab

PURPOSE

     Produces a collating table.

SYNOPSIS
     ctab [ -i infile ] [ -o outfile ]


DESCRIPTION

     The ctab command  takes an input file (by  default a file
     named  ctab.in  found  in   the  current  directory)  and
     produces a  binary file (by default  named ctab.out) con-
     taining a collating table.  These output files are stored
     in  a conventional  directory.   Programs  that need  the
     current  collating and  case information  use the  NLCTAB
     environment variable to access that information.

     The following conventions  are used to make  it easier to
     set up a table file:

     o   One line of information is present for each character
         explicitly named.
     o   A  line  beginning with  the  word  option serves  to
         change  one  or more  of  the  default conditions  or
         metacharacters built into ctab.   An option line con-
         tains a  set of name/value  pairs, with each  half of
         each pair delimited by  tab or space characters.  The
         following is a list of recognized names:
         eclass   Turns the  use of equivalence classes  on or
                  off globally.  The assigned value must be on
                  (the default) or off.
         sep      Uses the  assigned value as the  field sepa-
                  rator  character.  The  default  value is  :
                  (colon).
         trans    Uses the  assigned value of  the "translate"
                  indicator in subject  character fields.  The
                  default character is | (vertical bar).
         repeat   Uses the assigned value as the "same as last
                  line" indicator in  subject character field.
                  The default value is ^ (circumflex).
         comment  Uses the assigned value as the comment char-
                  acter.   The default  value is  the #  char-
                  acter.
     o   The order of the  per-character input lines specifies
         the collating sequence.
     o   By default, fields on a line are separated by colons.
         Tabs  or spaces  may surround  fields or  separators.
         You can change the separator character with an option
         line.
     o   Use an  octal escape sequence  in the ASCII  range to
         name a nonprintable character.  A backslash character
         that does  not form part  of a valid  escape sequence
         serves to strip the  following character, including a

         second backslash, of any special meaning it otherwise
         would have.  For example,  to include the colon char-
         acter in  the collating  sequence, use  the following
         line:

           \::

         The input file format  includes a comment convention,
         namely that the  remainder of the line  following a #
         character is  ignored.  The comment character  can be
         changed with an option line.

INPUT FILE SPECIFICATION

     Use the  following rules to build  infile, entering field
     information for each line:

     1.  The first field on a  line contains the subject char-
         acter, a character to  be inserted into the collating
         sequence at that point.
         o   This subject  character definition can  include a
             translation mechanism:
             -   Instead of a single character, this field may
                 contain two or more characters that are to be
                 collated as a single unit, or
             -   The single subject  character may be followed
                 by  a  vertical  bar  (|) and  a  single-  or
                 multiple-character string.   The vertical bar
                 indicates  that the  first character  will be
                 translated to the  second string before being
                 collated.

                 For example,  to treat  an "e<" (e  acute) as
                 equivalent to the character "e," use the fol-
                 lowing line:

                   é|oe

             -   One restriction is  placed on the translation
                 mechanism:  the  subject character  cannot be
                 contained in the translated string of charac-
                 ters.   For example,  the  following line  is
                 illegal:

                   o|oe

         o   Any  form  of  the  first  field  may  contain  a
             trailing  circumflex (^)  to  indicated that  the
             current character is to collate to the same value
             as the preceding one.  However, a circumflex fol-
             lowing  a translation  string is  illegal because
             the  subject character  to be  translated has  no
             inherent collating value.
         o   If the  subject field  contains a string  of mul-
             tiple  characters (to  collate  as  a unit),  its
             first  character must  be  declared elsewhere  to
             establish the default  collating sequence of that
             character.
         o   The translate and  collating no-change characters
             can be changed with option lines.

     2.  The second and third fields  specify whether or not a
         character  is  alphabetic  and what  its  lower-  and
         uppercase equivalents are:
         o   If a subject character is to be treated as a low-
             ercase alphabetic,  the second field on  its line
             is its uppercase equivalent,  and the third field
             must be l or L.
         o   If  a subject  character is  to be  treated as  a
             uppercase  alphabetic, the  second  field on  its
             line is  its lowercase equivalent, and  the third
             field must be u or U.
         o   If  a subject  character is  to be  treated as  a
             control character or a space character, the third
             field must be c, C, s, or S.
         o   Each character  explicitly named whose  line con-
             tains a  nonnull second field will  be considered
             alphabetic  (that  is,   matched  by  NCisalpha).
             Characters that do not  have an uppercase or low-
             ercase  equivalent (that  is,  that  have a  null
             second field) but that  you wish to be considered
             alphabetic  should simply  contain a  third field
             that is l, L, u, or u.
     3.  The  fourth field  on a  line is  used explicitly  to
         specify the first character  in the equivalence class
         of the subject character.   The members of one equiv-
         alence  class must  be  consecutively  listed in  the
         input file.
         o   There  cannot be  any  gaps  within a  particular
             equivalence  class.  For  example, the  following
             lines will put the characters  a, b, and c in the
             same equivalence class:

               a:A:l:a
               b:B:l:a
               c:C:l:a

         o   As  a convenience,  if  the fourth  field is  not
             specified, then the  group of consecutive charac-
             ters with blank fourth fields, provided that they
             are all based on  the same Roman alphabetic char-
             acter,  will be  placed in  the same  equivalence
             class.   To reiterate,  only characters  with the
             same  base will  be placed  into the  same equiv-
             alence  class by  default.  If  you wish  to have
             many  characters from  different bases  belong to
             one equivalence  class, as in the  example above,
             the first character of  the equivalence class has
             to  be specified  in the  fourth field  for every
             character specified.

         o   It is illegal to specify an equivalence character
             that comes later in  the collating sequence.  The
             fourth field  can refer  only to  characters that
             have already been mentioned.
         o   All  international  character support  characters
             not  based  on  Roman  alphabetic  characters  by
             default are the sole members of their equivalence
             class.

     Characters  not named  in  the table  file  that have  an
     ordinal value (that  is, a value as an  NLchar) below the
     ordinal value  of the  lowest-valued character  named are
     put into the collating sequence below the first character
     in the table file.  All other characters not named in the
     table file are put into  the collating sequence above the
     last character in the table file.

     The  standard  characters  for  decimal  and  hexadecimal
     digits  are always  marked as  digits (to  be matched  by
     NCisdigit and  NCisxdigit).  All other  printable charac-
     ters not marked as alphabetic are marked as punctuation.

FLAGS

     -i  infile   Specifies  the   name  of  the   input  file
                  (ctab.in by default).
     -o  outfile  Specifies  the  name   of  the  output  file
                  (ctab.out by default).

FILES

     /usr/lib/nls/ascii.in     Input  file  listing the  ASCII
                               range of characters.
     /usr/lib/nls/iso.in       Input file listing the ISO Col-
                               lating Sequence

RELATED INFORMATION

     The NCisalpha,  NCisdigit, NCisxdigit, nls,  and NLgetenv
     subroutines in AIX Operating System Technical Reference.

     The "Overview of International  Character Support" in IBM
     RT PC Managing the AIX Operating System.
Museum