Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ colltbl(1a) — NEWS-os 5.0.1

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

memory(3C)

setlocale(3C)

strcoll(3C)

string(3C)

strxfrm(3C)

environ(5)



colltbl(1M)      SYSTEM ADMINISTRATION COMMANDS       colltbl(1M)



NAME
     colltbl - create collation database

SYNOPSIS
     colltbl [ file | - ]

DESCRIPTION
     The colltbl command takes as  input  a  specification  file,
     file, that describes the collating sequence for a particular
     language  and  creates  a  database  that  can  be  read  by
     strxfrm(3C)  and  strcoll(3C).   strxfrm(3C)  transforms its
     first argument and places the result in its second argument.
     The  transformed  string  is  such  that it can be correctly
     ordered with other transformed strings by using  strcmp(3C),
     strncmp(3C) or memcmp(3C).  strcoll(3C) transforms its argu-
     ments and does a comparison.  If no input file is  supplied,
     stdin  is  read. The output file produced contains the data-
     base with collating sequence information in a form usable by
     system  commands and routines.  The name of this output file
     is the value you assign to the keyword codeset read in  from
     file.  Before this file can be used, it must be installed in
     the   /usr/lib/locale/locale   directory   with   the   name
     LCCOLLATE by someone who is super-user or a member of group
     bin.  locale corresponds to the language area  whose  colla-
     tion sequence is described in file.  This file must be read-
     able by user, group, and other; no other permissions  should
     be  set.   To use the collating sequence information in this
     file, set the LCCOLLATE environment variable  appropriately
     (see  environ(5) or setlocale(3C)).  The colltbl command can
     support languages whose collating sequence can be completely
     described by the following cases:

     ⊕   Ordering of single characters within the  codeset.   For
         example,  in  Swedish, V is sorted after U, before X and
         with W (V and W are considered identical as far as sort-
         ing is concerned).

     ⊕   Ordering  of  "double  characters"  in   the   collation
         sequence.   For  example, in Spanish, ch and ll are col-
         lated after c and l, respectively.

     ⊕   Ordering of a single character as if it consists of  two
         characters.   For  example, in German, the "sharp s", B,
         is sorted as ss.  This is a special instance of the next
         case below.

     ⊕   Substitution of one character string with another  char-
         acter  string.   In  the  example above, the string B is
         replaced with ss during sorting.

     ⊕   Ignoring certain characters in the codeset during colla-
         tion.   For example, if - were ignored during collation,



                                                                1





colltbl(1M)      SYSTEM ADMINISTRATION COMMANDS       colltbl(1M)



         then the strings re-locate and relocate would be equal.

     ⊕   Secondary ordering  between  characters.   In  the  case
         where  two  characters are sorted together in the colla-
         tion sequence, (i.e., they have the same "primary"  ord-
         ering),  there is sometimes a secondary ordering that is
         used if two strings are identical except for  characters
         that  have  the  same primary ordering.  For example, in
         French, the letters e and ` have the same primary order-
         ing  but  e  comes  before  ` in the secondary ordering.
         Thus the word lever would be ordered before  l`ver,  but
         l`ver  would be sorted before levitate.  (Note that if e
         came before ` in the primary ordering, then l`ver  would
         be sorted after levitate.)

     The specification file consists of  three  types  of  state-
     ments:

     1.  codeset   filename

         filename is the name of the output file to be created by
         colltbl.

     2.  order is  order_list

         order_list is a list  of  symbols,  separated  by  semi-
         colons,  that  defines the collating sequence.  The spe-
         cial symbol, ..., specifies symbols that  are  lexically
         sequential in a short-hand form.  For example,
              order is  a;b;c;d;...;x;y;z

         would specify the list of lower_case letters. Of course,
         this could be further compressed to just a;...;z.

         A symbol can be up to two bytes in  length  and  can  be
         represented in any one of the following ways:

         ⊕   the symbol itself (e.g., a for the lower-case letter
             a),

         ⊕   in octal representation (e.g., \141 or 0141 for  the
             letter a), or

         ⊕   in hexadecimal representation (e.g.,  \x61  or  0x61
             for the letter a).

         Any combination of these may be used as well.

         The backslash character, \ , is used  for  continuation.
         No  characters are permitted after the backslash charac-
         ter.




                                                                2





colltbl(1M)      SYSTEM ADMINISTRATION COMMANDS       colltbl(1M)



         Symbols enclosed in parenthesis are  assigned  the  same
         primary ordering but different secondary ordering.  Sym-
         bols enclosed in curly brackets are  assigned  only  the
         same primary ordering.  For example,


              order is  a;b;c;ch;d;(e;`);f;...;z;\
                        {1;...;9};A;...;Z

         In the above example, e and ` are assigned the same pri-
         mary ordering and different secondary ordering, digits 1
         through 9 are assigned the same primary ordering and  no
         secondary  ordering.   Only primary ordering is assigned
         to the remaining symbols.  Notice how double letters can
         be  specified in the collating sequence (letter ch comes
         between c and d).

         If a character is not included in the order is statement
         it  is  excluded  from  the ordering and will be ignored
         during sorting.

     3.  substitute string with repl

         The substitute statement substitutes the  string  string
         with the string repl.  This can be used, for example, to
         provide rules to sort the abbreviated month names numer-
         ically:


              substitute "Jan" with "01"
              substitute "Feb" with "02"
                   .
                   .
                   .
              substitute "Dec" with "12"

         A simpler use of the substitute statement that was  men-
         tioned  above  was to substitute a single character with
         two characters, as with the substitution of B with ss in
         German.

     The substitute statement is  optional.   The  order  is  and
     codeset statements must appear in the specification file.

     Any lines in the specification file with a #  in  the  first
     column are treated as comments and are ignored.  Empty lines
     are also ignored.

EXAMPLE
     The following  example  shows  the  collation  specification
     required  to  support  a hypothetical telephone book sorting
     sequence.



                                                                3





colltbl(1M)      SYSTEM ADMINISTRATION COMMANDS       colltbl(1M)



     The sorting sequence is defined by the following rules:

     a.   Upper and lower case letters must be  sorted  together,
          but  upper case letters have precedence over lower case
          letters.

     b.   All  special  characters  and  punctuation  should   be
          ignored.

     c.   Digits must be sorted as their alphabetic  counterparts
          (e.g., 0 as zero, 1 as one).

     d.   The Ch, ch, CH combinations must be collated between  C
          and D.

     e.   V and W, v and w must be collated together.

     The input specification file to colltbl will contain:


               codeset   telephone

               order is  A;a;B;b;C;c;CH;Ch;ch;D;d;E;e;F;f;\
                         G;g;H;h:I;i;J;j;K;k;L;l;M;m;N;n;O;o;P;p;\
                         Q;q;R;r;S;s;T;t;U;u;{V;W};{v;w};X;x;Y;y;Z;z

               substitute "0" with "zero"
               substitute "1" with "one"
               substitute "2" with "two"
               substitute "3" with "three"
               substitute "4" with "four"
               substitute "5" with "five"
               substitute "6" with "six"
               substitute "7" with "seven"
               substitute "8" with "eight"
               substitute "9" with "nine"

FILES
     /lib/locale/locale/LCCOLLATE
                     LCCOLLATE database for locale

     /usr/lib/locale/C/colltblC
                     input file used to construct  LCCOLLATE  in
                     the default locale.

SEE ALSO
     memory(3C),    setlocale(3C),    strcoll(3C),    string(3C),
     strxfrm(3C),   environ(5)   in  the  Programmer's  Reference
     Manual.






                                                                4



Typewritten Software • bear@typewritten.org • Edmonds, WA 98026