Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ colltbl(1M) — DG/UX R4.11MU05

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

memory(3C)

setlocale(3C)

strcoll(3C)

string(3C)

strxfrm(3C)

regexpr(3G)

environ(5)



colltbl(1M)                    DG/UX R4.11MU05                   colltbl(1M)


NAME
       colltbl - create collation database

SYNOPSIS
       colltbl [ file | - ]
       colltbl -d [ file ]

DESCRIPTION
       The colltbl command without the -d option takes as input a
       specification file, file, that describes the collating sequence for a
       particular language and creates a database that can be read by
       strxfrm(3C) and strcoll(3C).  strxfrm(3C) transforms its first
       argument and places the result in its second argument. The
       transformed string is such that it can be correctly ordered with
       other transformed strings by using strcmp(3C), strncmp(3C) or
       memcmp(3C).  strcoll(3C) transforms its arguments and does a
       comparison.

       If no input file is supplied, stdin is read.

       The output file produced contains the database with collating
       sequence information in a form usable by system commands and
       routines.  The name of this output file is the value you assign to
       the keyword codeset read in from file.  Before this file can be used,
       it must be installed in the /usr/lib/locale/locale directory with the
       name LCCOLLATE by someone who is a user with appropriate privilege
       or a member of group bin.  locale corresponds to the language area
       whose collation sequence is described in file.  This file must be
       readable by user, group, and other; no other permissions should be
       set.  To use the collating sequence information in this file, set the
       LCCOLLATE or LANG environment variable appropriately (see environ(5)
       or setlocale(3C)).

       With the -d option, colltbl dumps to its standard output a text
       version of the LC_COLLATE collation table in file file.  If no input
       file is specified, the collation table in use for the current locale
       is dumped.  You can modify the resulting text file, and use it as
       input to colltbl, to produce a modified LC_COLLATE collation table
       file.  This file may be used to either replace the existing
       LC_COLLATE file in an existing locale, or to create a new locale.
       However, you must never modify any of the files (including
       LC_COLLATE) in /usr/lib/locale/C, the C locale.

       The colltbl command can support languages whose collating sequence
       can be completely described by the following cases:

       ·   Ordering of single characters within the codeset.  For example,
           in Swedish, V is sorted after U, before X and with W (V and W are
           considered identical as far as sorting is concerned).

       ·   Ordering of "double characters" in the collation sequence.  For
           example, in Spanish, ch and ll are collated after c and l,
           respectively.

       ·   Ordering of a single character as if it consists of two
           characters.  For example, in German, the "sharp s", ß, is sorted
           as ss.  This is a special instance of the next case below.

       ·   Substitution of one character string with another character
           string.  In the example above, the string ß is replaced with ss
           during sorting.

       ·   Ignoring certain characters in the codeset during collation.  For
           example, if - were ignored during collation, then the strings
           re-locate and relocate would be equal.

       ·   Secondary ordering between characters.  In the case where two
           characters are sorted together in the collation sequence, (i.e.,
           they have the same "primary" ordering), there is sometimes a
           secondary ordering that is used if two strings are identical
           except for characters that have the same primary ordering.  For
           example, in French, the letters e and ` have the same primary
           ordering but e comes before ` in the secondary ordering.  Thus
           the word lever would be ordered before l`ver, but l`ver would be
           sorted before levitate.  (Note that if e came before ` in the
           primary ordering, then l`ver would be sorted after levitate.)

       The specification file consists of three types of statements:

       1.  codeset   filename

           filename is the name of the output file to be created by colltbl.

       2.  order is  orderlist

           orderlist is a list of symbols, separated by semicolons, that
           defines the collating sequence.  The special symbol, ...,
           specifies symbols that are lexically sequential in a short-hand
           form.  For example,
                order is  a;b;c;d;...;x;y;z

           would specify the list of lower_case letters. Of course, this
           could be further compressed to just a;...;z.

           A symbol can be up to two bytes in length and can be represented
           in any one of the following ways:

           ·   the symbol itself (e.g., a for the lower-case letter a),

           ·   in octal representation (e.g., \141 or 0141 for the letter
               a), or

           ·   in hexadecimal representation (e.g., \x61 or 0x61 for the
               letter a).

           Any combination of these may be used as well.

           The backslash character, \ , is used for continuation.  No
           characters are permitted after the backslash character.

           Symbols enclosed in parentheses are assigned the same primary
           ordering but different secondary ordering.  Symbols enclosed in
           curly brackets are assigned only the same primary ordering.  For
           example,

                order is  a;b;c;ch;d;(e;`);f;...;z;\
                          {1;...;9};A;...;Z

           In the above example, e and ` are assigned the same primary
           ordering and different secondary ordering, digits 1 through 9 are
           assigned the same primary ordering and no secondary ordering.
           Only primary ordering is assigned to the remaining symbols.
           Notice how double letters can be specified in the collating
           sequence (letter ch comes between c and d).

           If a character is not included in the order is statement it is
           excluded from the ordering and will be ignored during sorting.

       3.  substitute string with repl

           The substitute statement substitutes the string string with the
           string repl.  This can be used, for example, to provide rules to
           sort the abbreviated month names numerically:

                  substitute "Jan" with "01"
                  substitute "Feb" with "02"
                       .
                       .
                       .
                  substitute "Dec" with "12"


           A simpler use of the substitute statement that was mentioned
           above was to substitute a single character with two characters,
           as with the substitution of ß with ss in German.

       The substitute statement is optional.  The order is and codeset
       statements must appear in the specification file.

       Any lines in the specification file with a # in the first column are
       treated as comments and are ignored.  Empty lines are also ignored.

EXAMPLE
       The following example shows the collation specification required to
       support a hypothetical telephone book sorting sequence.

       The sorting sequence is defined by the following rules:

       a.     Upper and lower case letters must be sorted together, but
              upper case letters have precedence over lower case letters.

       b.     All special characters and punctuation should be ignored.

       c.     Digits must be sorted as their alphabetic counterparts (e.g.,
              0 as zero, 1 as one).

       d.     The Ch, ch, CH combinations must be collated between C and D.

       e.     V and W, v and w must be collated together.

       The input specification file to colltbl will contain:

                   codeset   telephone

                   order is  A;a;B;b;C;c;CH;Ch;ch;D;d;E;e;F;f;\
                             G;g;H;h;I;i;J;j;K;k;L;l;M;m;N;n;O;o;P;p;\
                             Q;q;R;r;S;s;T;t;U;u;{V;W};{v;w};X;x;Y;y;Z;z

                   substitute "0" with "zero"
                   substitute "1" with "one"
                   substitute "2" with "two"
                   substitute "3" with "three"
                   substitute "4" with "four"
                   substitute "5" with "five"
                   substitute "6" with "six"
                   substitute "7" with "seven"
                   substitute "8" with "eight"
                   substitute "9" with "nine"

FILES
       /lib/locale/locale/LCCOLLATE
                       LCCOLLATE database for locale

       /usr/lib/locale/C/colltblC
                       input file used to construct LCCOLLATE in the
                       default locale.

DIAGNOSTICS
       Exit code is 0 on successful completion, and >0 if an error occurs.

SEE ALSO
       memory(3C), setlocale(3C), strcoll(3C), string(3C), strxfrm(3C),
       regexpr(3G), environ(5).


Licensed material--property of copyright holder(s)

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026