Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ colltbl(1M) — Reliant UNIX 5.44c4

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

memory(3C)

setlocale(3C)

strcoll(3C)

string(3C)

strxfrm(3C)

environ(5)

colltbl(1M)                                                     colltbl(1M)

NAME
     colltbl - create collation database

SYNOPSIS
     colltbl [file|-]

DESCRIPTION
     The colltbl command takes as input a specification file, file, that
     describes the collating sequence for a particular language and creates
     a database that can be read by strxfrm(3C) and strcoll(3C).
     strxfrm(3C) transforms its second argument and places the result in
     its first argument. The transformed string is such that it can be
     correctly ordered with other transformed strings by using strcmp(3C),
     strncmp(3C) or memcmp(3C). strcoll(3C) transforms its arguments and
     does a comparison.

     If no input file is supplied, stdin is read.

     The output file produced contains the database with collating sequence
     information in a form usable by system commands and routines. The name
     of this output file is the value you assign to the keyword codeset
     read in from file. Before this file can be used, it must be installed
     in the /usr/lib/locale/locale directory with the name LCCOLLATE by
     someone who is superuser or a member of group bin. locale corresponds
     to the language area whose collation sequence is described in file.
     This file must be readable by user, group, and other; no other permis-
     sions should be set. To use the collating sequence information in this
     file, set the LCCOLLATE environment variable appropriately [see
     environ(5) or setlocale(3C)].

     The colltbl command can support languages whose collating sequence can
     be completely described by the following cases:

     -  Ordering of single characters within the codeset. For example, in
        Swedish, V is sorted after U, before X and with W (V and W are con-
        sidered identical as far as sorting is concerned).

     -  Ordering of "double characters" in the collation sequence. For
        example, in Spanish, ch and ll are collated after c and l, respec-
        tively.

     -  Ordering of a single character as if it consists of two characters.
        For example, in German, the "sharp s", ß, is sorted as ss. This is
        a special instance of the next case below.

     -  Substitution of one character string with another character string.
        In the example above, the string ß is replaced with ss during sort-
        ing.

     -  Ignoring certain characters in the codeset during collation. For
        example, if the hyphen "-" were ignored during collation, then the
        strings re-locate and relocate would be equal.



Page 1                       Reliant UNIX 5.44                Printed 11/98

colltbl(1M)                                                     colltbl(1M)

     -  Secondary ordering between characters. In the case where two char-
        acters are sorted together in the collation sequence, (i.e. they
        have the same "primary" ordering), there is sometimes a secondary
        ordering that is used if two strings are identical except for char-
        acters that have the same primary ordering. For example, in French,
        the letters e and è have the same primary ordering but e comes
        before è in the secondary ordering. Thus the word des would be
        ordered before dès.

     The specification file consists of three types of statements:

     1. codeset filename

        filename is the name of the output file to be created by colltbl.

     2. order is orderlist

        orderlist is a list of symbols, separated by semicolons, that
        defines the collating sequence. The special symbol ". . ." speci-
        fies symbols that are lexically sequential in a short-hand form.
        For example,

             order is a;b;c;d; . . . ;x;y;z

        would specify the list of lowercase letters. Of course, this could
        be further compressed to just a; . . . ;z.

        A symbol can be up to two bytes in length and can be represented in
        any one of the following ways:

        -  the symbol itself (e.g. a for the lowercase letter a),

        -  in octal representation (e.g. \141 or 0141 for the letter a), or

        -  in hexadecimal representation (e.g. \x61 or 0x61 for the letter
           a).

        Any combination of these may be used as well.

        The backslash character (\) is used for continuation. No characters
        are permitted after the backslash character.

        Symbols enclosed in parentheses are assigned the same primary ord-
        ering but different secondary ordering. Symbols enclosed in curly
        brackets are assigned only the same primary ordering. For example,

             order is a;b;c;ch;d;(e;è);f; . . . ;z;\
                      {1; . . . ;9};A; . . . ;Z






Page 2                       Reliant UNIX 5.44                Printed 11/98

colltbl(1M)                                                     colltbl(1M)

        In the above example, e and è are assigned the same primary order-
        ing and different secondary ordering, digits 1 through 9 are
        assigned the same primary ordering and no secondary ordering. Only
        primary ordering is assigned to the remaining symbols. Notice how
        double letters can be specified in the collating sequence (letter
        ch comes between c and d).

        If a character is not included in the order is statement it is
        excluded from the ordering and will be ignored during sorting.

     3. substitute string with repl

        The substitute statement substitutes the string string with the
        string repl. This can be used, for example, to provide rules to
        sort the abbreviated month names numerically:

             substitute "Jan" with "01"
             substitute "Feb" with "02"
             . . .
             substitute "Dec" with "12"

        A simpler use of the substitute statement that was mentioned above
        was to substitute a single character with two characters, as with
        the substitution of ß with ss in German.

     The substitute statement is optional. The order is and codeset state-
     ments must appear in the specification file.

     Any lines in the specification file with a # in the first column are
     treated as comments and are ignored. Blank lines are also ignored.

EXAMPLE
     The following example shows the collation specification required to
     support a hypothetical telephone book sorting sequence.

     The sorting sequence is defined by the following rules:

     a) Uppercase and lowercase letters must be sorted together, but upper-
        case letters have precedence over lowercase letters.

     b) All special characters and punctuation should be ignored.

     c) Digits must be sorted as their alphabetic counterparts (e.g. 0 as
        zero, 1 as one).

     d) The Ch, ch, CH combinations must be collated between C and D.

     e) V and W, v and w must be collated together.






Page 3                       Reliant UNIX 5.44                Printed 11/98

colltbl(1M)                                                     colltbl(1M)

     The input specification file to colltbl will contain:

               codeset telephone

               order is A;a;B;b;C;c;CH;Ch;ch;D;d;E;e;F;f;\
                        G;g;H;h:I;i;J;j;K;k;L;l;M;m;N;n;O;o;P;p;\
                        Q;q;R;r;S;s;T;t;U;u;{V;W};{v;w};X;x;Y;y;Z;z

               substitute "0" with "zero"
               substitute "1" with "one"
               substitute "2" with "two"
               substitute "3" with "three"
               substitute "4" with "four"
               substitute "5" with "five"
               substitute "6" with "six"
               substitute "7" with "seven"
               substitute "8" with "eight"
               substitute "9" with "nine"

LOCALE
     The LCMESSAGES environment variable governs the language in which
     message texts are displayed.

     The LCCTYPE environment variable governs character classes, character
     conversion (shifting), and the behavior of character classes in regu-
     lar expressions.

     If LCMESSAGES or LCCTYPE is undefined or is defined as the null
     string, it defaults to the value of LANG. If LANG is likewise unde-
     fined or null, the system acts as if it were not internationalized.

     If any of the locale variables has an invalid value, the system acts
     as if none of the variables was set.

     The LCALL environment variable governs the entire locale. LCALL
     takes precedence over all the other environment variables which affect
     internationalization.

FILES
     /lib/locale/locale/LCCOLLATE
          LCCOLLATE database for locale

     /usr/lib/locale/C/colltblC
          input file used to construct LCCOLLATE in the default locale

SEE ALSO
     memory(3C), setlocale(3C), strcoll(3C), string(3C), strxfrm(3C),
     environ(5).






Page 4                       Reliant UNIX 5.44                Printed 11/98

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026