colltbl(1M) colltbl(1M)
NAME
colltbl - create collation database
SYNOPSIS
colltbl [file | -]
DESCRIPTION
The colltbl command takes as input a specification file, file,
that describes the collating sequence for a particular
language and creates a database that can be read by
strxfrm(3C) and strcoll(3C). strxfrm(3C) transforms its first
argument and places the result in its second argument. The
transformed string is such that it can be correctly ordered
with other transformed strings by using strncmp [see
string(3C)]. strcoll(3C) transforms its arguments and does a
comparison.
If no input file is supplied, stdin is read.
The output file produced contains the database with collating
sequence information in a form usable by system commands and
routines. The name of this output file is the value you
assign to the keyword codeset read in from file. Before this
file can be used, it must be installed in the
/usr/lib/locale/locale directory with the name LC_COLLATE by
someone who is super-user or a member of group bin. locale
corresponds to the language area whose collation sequence is
described in file. This file must be readable by user, group,
and other; no other permissions should be set. To use the
collating sequence information in this file, set the
LC_COLLATE environment variable appropriately [see environ(5)
or setlocale(3C)].
The colltbl command can support languages whose collating
sequence can be completely described by the following cases:
Ordering of single characters within the code set. For
example, in Swedish, V is sorted after U, before X, and
with W (V and W are considered identical as far as
sorting is concerned).
Ordering of ``double characters'' in the collation
sequence. For example, in Spanish, ch and ll are
collated after c and l, respectively.
Copyright 1994 Novell, Inc. Page 1
colltbl(1M) colltbl(1M)
Ordering of a single character as if it consists of two
characters. For example, in German, the ``sharp s,'' B,
is sorted as ss. This is a special instance of the next
case below.
Substitution of one character string with another
character string. In the example above, the string B is
replaced with ss during sorting.
Ignoring certain characters in the code set during
collation. For example, if - were ignored during
collation, then the strings re-locate and relocate would
compare as equal.
Secondary ordering between characters. In the case
where two characters are sorted together in the
collation sequence, (that is, they have the same
"primary" ordering), there is sometimes a secondary
ordering that is used if two strings are identical
except for characters that have the same primary
ordering. For example, in French, the letters e and `
have the same primary ordering but e comes before ` in
the secondary ordering. Thus the word lever would be
ordered before l`ver, but l`ver would be sorted before
levitate. (Note that if e came before ` in the primary
ordering, then l`ver would be sorted after levitate.)
The specification file consists of three types of statements:
1. codeset filename
filename is the name of the output file to be created by
colltbl.
2. order is order_list
order_list is a list of symbols, separated by semicolons,
that defines the collating sequence. The special symbol, .
. . , specifies symbols that are lexically sequential in a
short-hand form. For example,
order is a;b;c;d;...;x;y;z
would specify the list of lowercase letters. Of course,
this could be further compressed to just a;...;z.
Copyright 1994 Novell, Inc. Page 2
colltbl(1M) colltbl(1M)
A symbol can be up to two bytes in length and can be
represented in any one of the following ways:
the symbol itself (for example, a for the
lowercase letter a),
in octal representation (for example, \141 or
0141 for the letter a), or
in hexadecimal representation (for example,
\x61 or 0x61 for the letter a).
Any combination of these may be used as well.
The backslash character, \ , is used for continuation. No
characters are permitted after the backslash character.
Symbols enclosed in parentheses are assigned the same
primary ordering but different secondary ordering. Symbols
enclosed in curly brackets are assigned only the same
primary ordering. For example,
order is a;b;c;ch;d;(e;`);f;...;z;\
{1;...;9};A;...;Z
In the above example, e and ` are assigned the same primary
ordering and different secondary ordering, digits 1 through
9 are assigned the same primary ordering and no secondary
ordering. Only primary ordering is assigned to the
remaining symbols. Notice how double letters can be
specified in the collating sequence (letter ch comes
between c and d).
If a character is not included in the order is statement,
it is excluded from the ordering and will be ignored during
sorting.
3. substitute string with repl
The substitute statement substitutes the string string with
the string repl. This can be used, for example, to provide
rules to sort the abbreviated month names numerically:
substitute "Jan" with "01"
substitute "Feb" with "02"
.
.
.
Copyright 1994 Novell, Inc. Page 3
colltbl(1M) colltbl(1M)
substitute "Dec" with "12"
A simpler use of the substitute statement would be to
substitute a single character with two characters, as with
the substitution of B with ss in German.
The substitute statement is optional. The order is and
codeset statements must appear in the specification file.
Any lines in the specification file with a # in the first
column are treated as comments and are ignored. Empty lines
are also ignored.
EXAMPLES
The following example shows the collation specification
required to support a hypothetical telephone book sorting
sequence.
The sorting sequence is defined by the following rules:
a. Upper- and lowercase letters must be sorted together, but
uppercase letters have precedence over lowercase letters.
b. All special characters and punctuation should be ignored.
c. Digits must be sorted as their alphabetic counterparts (for
example, 0 as zero, 1 as one).
d. The Ch, ch, CH combinations must be collated between C and
D.
e. V and W, v and w must be collated together.
The input specification file to colltbl will contain:
codeset telephone
order is A;a;B;b;C;c;CH;Ch;ch;D;d;E;e;F;f;\
G;g;H;h:I;i;J;j;K;k;L;l;M;m;N;n;O;o;P;p;\
Q;q;R;r;S;s;T;t;U;u;{V;W};{v;w};X;x;Y;y;Z;z
substitute "0" with "zero"
substitute "1" with "one"
substitute "2" with "two"
substitute "3" with "three"
substitute "4" with "four"
substitute "5" with "five"
substitute "6" with "six"
Copyright 1994 Novell, Inc. Page 4
colltbl(1M) colltbl(1M)
substitute "7" with "seven"
substitute "8" with "eight"
substitute "9" with "nine"
FILES
/lib/locale/locale/LC_COLLATE
LC_COLLATE database for locale
/usr/lib/locale/C/colltbl_C
input file used to construct LC_COLLATE in the default
locale.
REFERENCES
environ(5), memory(3C), setlocale(3C), strcoll(3C),
string(3C), strxfrm(3C)
Copyright 1994 Novell, Inc. Page 5