Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ colltbl(1M) — UnixWare 2.01

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

environ(5)

memory(3C)

setlocale(3C)

strcoll(3C)

string(3C)

strxfrm(3C)






       colltbl(1M)                                              colltbl(1M)


       NAME
             colltbl - create collation database

       SYNOPSIS
             colltbl [file | -]

       DESCRIPTION
             The colltbl command takes as input a specification file, file,
             that describes the collating sequence for a particular
             language and creates a database that can be read by
             strxfrm(3C) and strcoll(3C).  strxfrm(3C) transforms its first
             argument and places the result in its second argument.  The
             transformed string is such that it can be correctly ordered
             with other transformed strings by using strncmp [see
             string(3C)].  strcoll(3C) transforms its arguments and does a
             comparison.

             If no input file is supplied, stdin is read.

             The output file produced contains the database with collating
             sequence information in a form usable by system commands and
             routines.  The name of this output file is the value you
             assign to the keyword codeset read in from file.  Before this
             file can be used, it must be installed in the
             /usr/lib/locale/locale directory with the name LC_COLLATE by
             someone who is super-user or a member of group bin.  locale
             corresponds to the language area whose collation sequence is
             described in file.  This file must be readable by user, group,
             and other; no other permissions should be set.  To use the
             collating sequence information in this file, set the
             LC_COLLATE environment variable appropriately [see environ(5)
             or setlocale(3C)].

             The colltbl command can support languages whose collating
             sequence can be completely described by the following cases:

                   Ordering of single characters within the code set.  For
                   example, in Swedish, V is sorted after U, before X, and
                   with W (V and W are considered identical as far as
                   sorting is concerned).

                   Ordering of ``double characters'' in the collation
                   sequence.  For example, in Spanish, ch and ll are
                   collated after c and l, respectively.




                           Copyright 1994 Novell, Inc.               Page 1













      colltbl(1M)                                              colltbl(1M)


                  Ordering of a single character as if it consists of two
                  characters.  For example, in German, the ``sharp s,'' B,
                  is sorted as ss.  This is a special instance of the next
                  case below.

                  Substitution of one character string with another
                  character string.  In the example above, the string B is
                  replaced with ss during sorting.

                  Ignoring certain characters in the code set during
                  collation.  For example, if - were ignored during
                  collation, then the strings re-locate and relocate would
                  compare as equal.

                  Secondary ordering between characters.  In the case
                  where two characters are sorted together in the
                  collation sequence, (that is, they have the same
                  "primary" ordering), there is sometimes a secondary
                  ordering that is used if two strings are identical
                  except for characters that have the same primary
                  ordering.  For example, in French, the letters e and `
                  have the same primary ordering but e comes before ` in
                  the secondary ordering.  Thus the word lever would be
                  ordered before l`ver, but l`ver would be sorted before
                  levitate.  (Note that if e came before ` in the primary
                  ordering, then l`ver would be sorted after levitate.)

            The specification file consists of three types of statements:

            1. codeset filename

               filename is the name of the output file to be created by
               colltbl.

            2. order is order_list

               order_list is a list of symbols, separated by semicolons,
               that defines the collating sequence.  The special symbol, .
               . . , specifies symbols that are lexically sequential in a
               short-hand form.  For example,
                      order is a;b;c;d;...;x;y;z

               would specify the list of lowercase letters.  Of course,
               this could be further compressed to just a;...;z.




                          Copyright 1994 Novell, Inc.               Page 2













       colltbl(1M)                                              colltbl(1M)


                A symbol can be up to two bytes in length and can be
                represented in any one of the following ways:

                             the symbol itself (for example, a for the
                             lowercase letter a),

                             in octal representation (for example, \141 or
                             0141 for the letter a), or

                             in hexadecimal representation (for example,
                             \x61 or 0x61 for the letter a).

                Any combination of these may be used as well.

                The backslash character, \ , is used for continuation.  No
                characters are permitted after the backslash character.

                Symbols enclosed in parentheses are assigned the same
                primary ordering but different secondary ordering.  Symbols
                enclosed in curly brackets are assigned only the same
                primary ordering.  For example,
                       order is a;b;c;ch;d;(e;`);f;...;z;\
                                {1;...;9};A;...;Z

                In the above example, e and ` are assigned the same primary
                ordering and different secondary ordering, digits 1 through
                9 are assigned the same primary ordering and no secondary
                ordering.  Only primary ordering is assigned to the
                remaining symbols.  Notice how double letters can be
                specified in the collating sequence (letter ch comes
                between c and d).

                If a character is not included in the order is statement,
                it is excluded from the ordering and will be ignored during
                sorting.

             3. substitute string with repl

                The substitute statement substitutes the string string with
                the string repl.  This can be used, for example, to provide
                rules to sort the abbreviated month names numerically:
                       substitute "Jan" with "01"
                       substitute "Feb" with "02"
                             .
                             .
                             .


                           Copyright 1994 Novell, Inc.               Page 3













      colltbl(1M)                                              colltbl(1M)


                      substitute "Dec" with "12"

               A simpler use of the substitute statement would be to
               substitute a single character with two characters, as with
               the substitution of B with ss in German.

            The substitute statement is optional.  The order is and
            codeset statements must appear in the specification file.

            Any lines in the specification file with a # in the first
            column are treated as comments and are ignored.  Empty lines
            are also ignored.

      EXAMPLES
            The following example shows the collation specification
            required to support a hypothetical telephone book sorting
            sequence.

            The sorting sequence is defined by the following rules:

            a. Upper- and lowercase letters must be sorted together, but
               uppercase letters have precedence over lowercase letters.

            b. All special characters and punctuation should be ignored.

            c. Digits must be sorted as their alphabetic counterparts (for
               example, 0 as zero, 1 as one).

            d. The Ch, ch, CH combinations must be collated between C and
               D.

            e. V and W, v and w must be collated together.

            The input specification file to colltbl will contain:

                  codeset     telephone
                  order is    A;a;B;b;C;c;CH;Ch;ch;D;d;E;e;F;f;\
                              G;g;H;h:I;i;J;j;K;k;L;l;M;m;N;n;O;o;P;p;\
                              Q;q;R;r;S;s;T;t;U;u;{V;W};{v;w};X;x;Y;y;Z;z
                  substitute "0" with "zero"
                  substitute "1" with "one"
                  substitute "2" with "two"
                  substitute "3" with "three"
                  substitute "4" with "four"
                  substitute "5" with "five"
                  substitute "6" with "six"


                          Copyright 1994 Novell, Inc.               Page 4













       colltbl(1M)                                              colltbl(1M)


                   substitute "7" with "seven"
                   substitute "8" with "eight"
                   substitute "9" with "nine"

       FILES
             /lib/locale/locale/LC_COLLATE
                   LC_COLLATE database for locale

             /usr/lib/locale/C/colltbl_C
                   input file used to construct LC_COLLATE in the default
                   locale.

       REFERENCES
             environ(5), memory(3C), setlocale(3C), strcoll(3C),
             string(3C), strxfrm(3C)

































                           Copyright 1994 Novell, Inc.               Page 5








Typewritten Software • bear@typewritten.org • Edmonds, WA 98026