Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ colltbl(1M) — Dell System V Release 4 Issue 2.2

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

memory(3C)

setlocale(3C)

strcoll(3C)

string(3C)

strxfrm(3C)

environ(5)



colltbl(1M)     UNIX System V(System Administration Utilities)      colltbl(1M)


NAME
      colltbl - create collation database

SYNOPSIS
      colltbl [ file | - ]

DESCRIPTION
      The colltbl command takes as input a specification file, file, that
      describes the collating sequence for a particular language and creates a
      database that can be read by strxfrm(3C) and strcoll(3C).  strxfrm(3C)
      transforms its first argument and places the result in its second
      argument. The transformed string is such that it can be correctly ordered
      with other transformed strings by using strcmp(3C), strncmp(3C) or
      memcmp(3C).  strcoll(3C) transforms its arguments and does a comparison.

      If no input file is supplied, stdin is read.

      The output file produced contains the database with collating sequence
      information in a form usable by system commands and routines.  The name
      of this output file is the value you assign to the keyword codeset read
      in from file.  Before this file can be used, it must be installed in the
      /usr/lib/locale/locale directory with the name LCCOLLATE by someone who
      is super-user or a member of group bin.  locale corresponds to the
      language area whose collation sequence is described in file.  This file
      must be readable by user, group, and other; no other permissions should
      be set.  To use the collating sequence information in this file, set the
      LCCOLLATE environment variable appropriately (see environ(5) or
      setlocale(3C)).

      The colltbl command can support languages whose collating sequence can be
      completely described by the following cases:

      ⊕   Ordering of single characters within the codeset.  For example, in
          Swedish, V is sorted after U, before X and with W (V and W are
          considered identical as far as sorting is concerned).

      ⊕   Ordering of "double characters" in the collation sequence.  For
          example, in Spanish, ch and ll are collated after c and l,
          respectively.

      ⊕   Ordering of a single character as if it consists of two characters.
          For example, in German, the "sharp s", ,, is sorted as ss.  This is a
          special instance of the next case below.

      ⊕   Substitution of one character string with another character string.
          In the example above, the string , is replaced with ss during
          sorting.

      ⊕   Ignoring certain characters in the codeset during collation.  For
          example, if - were ignored during collation, then the strings
          re-locate and relocate would be equal.



10/89                                                                    Page 1







colltbl(1M)     UNIX System V(System Administration Utilities)      colltbl(1M)


      ⊕   Secondary ordering between characters.  In the case where two
          characters are sorted together in the collation sequence, (i.e., they
          have the same "primary" ordering), there is sometimes a secondary
          ordering that is used if two strings are identical except for
          characters that have the same primary ordering.  For example, in
          French, the letters e and e have the same primary ordering but e
          comes before e in the secondary ordering.  Thus the word lever would
          be ordered before lever, but lever would be sorted before levitate.
          (Note that if e came before e in the primary ordering, then lever
          would be sorted after levitate.)

      The specification file consists of three types of statements:

      1.  codeset     filename

          filename is the name of the output file to be created by colltbl.

      2.  order is    order_list

          order_list is a list of symbols, separated by semicolons, that
          defines the collating sequence.  The special symbol, ..., specifies
          symbols that are lexically sequential in a short-hand form.  For
          example,
               order is     a;b;c;d;...;x;y;z

          would specify the list of lower_case letters. Of course, this could
          be further compressed to just a;...;z.

          A symbol can be up to two bytes in length and can be represented in
          any one of the following ways:

          ⊕   the symbol itself (e.g., a for the lower-case letter a),

          ⊕   in octal representation (e.g., \141 or 0141 for the letter a), or

          ⊕   in hexadecimal representation (e.g., \x61 or 0x61 for the letter
              a).

          Any combination of these may be used as well.

          The backslash character, \ , is used for continuation.  No characters
          are permitted after the backslash character.

          Symbols enclosed in parenthesis are assigned the same primary
          ordering but different secondary ordering.  Symbols enclosed in curly
          brackets are assigned only the same primary ordering.  For example,


                order is    a;b;c;ch;d;(e;e);f;...;z;\
                           {1;...;9};A;...;Z




Page 2                                                                    10/89







colltbl(1M)     UNIX System V(System Administration Utilities)      colltbl(1M)


          In the above example, e and e are assigned the same primary ordering
          and different secondary ordering, digits 1 through 9 are assigned the
          same primary ordering and no secondary ordering.  Only primary
          ordering is assigned to the remaining symbols.  Notice how double
          letters can be specified in the collating sequence (letter ch comes
          between c and d).

          If a character is not included in the order is statement it is
          excluded from the ordering and will be ignored during sorting.

      3.  substitute string with repl

          The substitute statement substitutes the string string with the
          string repl.  This can be used, for example, to provide rules to sort
          the abbreviated month names numerically:







































10/89                                                                    Page 3







colltbl(1M)     UNIX System V(System Administration Utilities)      colltbl(1M)



                substitute "Jan" with "01"
                substitute "Feb" with "02"
                      .
                      .
                      .
                substitute "Dec" with "12"

          A simpler use of the substitute statement that was mentioned above
          was to substitute a single character with two characters, as with the
          substitution of , with ss in German.

      The substitute statement is optional.  The order is and codeset
      statements must appear in the specification file.

      Any lines in the specification file with a # in the first column are
      treated as comments and are ignored.  Empty lines are also ignored.

EXAMPLE
      The following example shows the collation specification required to
      support a hypothetical telephone book sorting sequence.

      The sorting sequence is defined by the following rules:

      a.    Upper and lower case letters must be sorted together, but upper
            case letters have precedence over lower case letters.

      b.    All special characters and punctuation should be ignored.

      c.    Digits must be sorted as their alphabetic counterparts (e.g., 0 as
            zero, 1 as one).

      d.    The Ch, ch, CH combinations must be collated between C and D.

      e.    V and W, v and w must be collated together.

      The input specification file to colltbl will contain:


                 codeset      telephone

                 order is     A;a;B;b;C;c;CH;Ch;ch;D;d;E;e;F;f;\
                              G;g;H;h:I;i;J;j;K;k;L;l;M;m;N;n;O;o;P;p;\
                              Q;q;R;r;S;s;T;t;U;u;{V;W};{v;w};X;x;Y;y;Z;z

                 substitute "0" with "zero"
                 substitute "1" with "one"
                 substitute "2" with "two"
                 substitute "3" with "three"
                 substitute "4" with "four"
                 substitute "5" with "five"
                 substitute "6" with "six"


Page 4                                                                    10/89







colltbl(1M)     UNIX System V(System Administration Utilities)      colltbl(1M)


                 substitute "7" with "seven"
                 substitute "8" with "eight"
                 substitute "9" with "nine"

FILES
      /lib/locale/locale/LCCOLLATE
                      LCCOLLATE database for locale

      /usr/lib/locale/C/colltblC
                      input file used to construct LCCOLLATE in the default
                      locale.

SEE ALSO
      memory(3C), setlocale(3C), strcoll(3C), string(3C), strxfrm(3C),
      environ(5) in the Programmer's Reference Manual.







































10/89                                                                    Page 5





Typewritten Software • bear@typewritten.org • Edmonds, WA 98026