COLLTBL(1) RISC/os Reference Manual COLLTBL(1)
NAME
colltbl - create collation database
SYNOPSIS
colltbl [ file | - ]
DESCRIPTION
The colltbl command takes as input a specification file,
file, that describes the collating sequence for a particular
language and creates a database that can be read by
strxfrm(3C) and strcoll(3C). strxfrm(3C) transforms its
first argument and places the result in its second argument.
The transformed string is such that it can be correctly
ordered with other transformed strings by using strcmp(3C),
strncmp(3C) or memcmp(3C). strcoll(3C) transforms its argu-
ments and does a comparison.
If no input file is supplied, stdin is read.
The output file produced contains the database with collat-
ing sequence information in a form usable by system commands
and routines. The name of this output file is the value you
assign to the keyword codeset read in from file. Before
this file can be used, it must be installed in the
/usr/lib/locale/locale directory with the name LC_COLLATE by
someone who is super-user or a member of group bin. locale
corresponds to the language area whose collation sequence is
described in file. This file must be readable by user,
group, and other; no other permissions should be set. To
use the collating sequence information in this file, set the
LC_COLLATE environment variable appropriately (see
environ(5) or setlocale(3C)).
The colltbl command can support languages whose collating
sequence can be completely described by the following cases:
⊕ Ordering of single characters within the codeset. For
example, in Swedish, V is sorted after U, before X and
with W (V and W are considered identical as far as sort-
ing is concerned).
⊕ Ordering of "double characters" in the collation
sequence. For example, in Spanish, ch and ll are col-
lated after c and l, respectively.
⊕ Ordering of a single character as if it consists of two
characters. For example, in German, the "sharp s", β,
is sorted as ss. This is a special instance of the next
case below.
⊕ Substitution of one character string with another char-
acter string. In the example above, the string β is
Printed 11/19/92 Page 1
COLLTBL(1) RISC/os Reference Manual COLLTBL(1)
replaced with ss during sorting.
⊕ Ignoring certain characters in the codeset during colla-
tion. For example, if - were ignored during collation,
then the strings re-locate and relocate would be equal.
⊕ Secondary ordering between characters. In the case
where two characters are sorted together in the colla-
tion sequence, (i.e., they have the same "primary" ord-
ering), there is sometimes a secondary ordering that is
used if two strings are identical except for characters
that have the same primary ordering. For example, in
French, the letters e and `
e have the same primary order-
ing but e comes before `
e in the secondary ordering.
Thus the word lever would be ordered before l`
ever, but
l`
ever would be sorted before levitate. (Note that if e
came before `
e in the primary ordering, then l`
ever would
be sorted after levitate.)
The specification file consists of three types of state-
ments:
1. codeset filename
filename is the name of the output file to be created by
colltbl.
2. order is order_list
order_list is a list of symbols, separated by semi-
colons, that defines the collating sequence. The spe-
cial symbol, ..., specifies symbols that are lexically
sequential in a short-hand form. For example,
order is a;b;c;d;...;x;y;z
would specify the list of lower_case letters. Of course,
this could be further compressed to just a;...;z.
A symbol can be up to two bytes in length and can be
represented in any one of the following ways:
⊕ the symbol itself (e.g., a for the lower-case letter
a),
⊕ in octal representation (e.g., \141 or 0141 for the
letter a), or
⊕ in hexadecimal representation (e.g., \x61 or 0x61
for the letter a).
Any combination of these may be used as well.
Page 2 Printed 11/19/92
COLLTBL(1) RISC/os Reference Manual COLLTBL(1)
The backslash character, \ , is used for continuation.
No characters are permitted after the backslash charac-
ter.
Symbols enclosed in parenthesis are assigned the same
primary ordering but different secondary ordering. Sym-
bols enclosed in curly brackets are assigned only the
same primary ordering. For example,
order is a;b;c;ch;d;(e;`
e);f;...;z;\
{1;...;9};A;...;Z
In the above example, e and `
e are assigned the same pri-
mary ordering and different secondary ordering, digits 1
through 9 are assigned the same primary ordering and no
secondary ordering. Only primary ordering is assigned
to the remaining symbols. Notice how double letters can
be specified in the collating sequence (letter ch comes
between c and d).
If a character is not included in the order is statement
it is excluded from the ordering and will be ignored
during sorting.
3. substitute string with repl
The substitute statement substitutes the string string
with the string repl. This can be used, for example, to
provide rules to sort the abbreviated month names numer-
ically:
substitute "Jan" with "01"
substitute "Feb" with "02"
.
.
.
substitute "Dec" with "12"
A simpler use of the substitute statement that was men-
tioned above was to substitute a single character with
two characters, as with the substitution of β with ss in
German.
The substitute statement is optional. The order is and
codeset statements must appear in the specification file.
Any lines in the specification file with a # in the first
column are treated as comments and are ignored. Empty lines
are also ignored.
Printed 11/19/92 Page 3
COLLTBL(1) RISC/os Reference Manual COLLTBL(1)
EXAMPLE
The following example shows the collation specification
required to support a hypothetical telephone book sorting
sequence.
The sorting sequence is defined by the following rules:
a. Upper and lower case letters must be sorted together,
but upper case letters have precedence over lower case
letters.
b. All special characters and punctuation should be
ignored.
c. Digits must be sorted as their alphabetic counterparts
(e.g., 0 as zero, 1 as one).
d. The Ch, ch, CH combinations must be collated between C
and D.
e. V and W, v and w must be collated together.
The input specification file to colltbl will contain:
codeset telephone
order is A;a;B;b;C;c;CH;Ch;ch;D;d;E;e;F;f;\
G;g;H;h:I;i;J;j;K;k;L;l;M;m;N;n;O;o;P;p;\
Q;q;R;r;S;s;T;t;U;u;{V;W};{v;w};X;x;Y;y;Z;z
substitute "0" with "zero"
substitute "1" with "one"
substitute "2" with "two"
substitute "3" with "three"
substitute "4" with "four"
substitute "5" with "five"
substitute "6" with "six"
substitute "7" with "seven"
substitute "8" with "eight"
substitute "9" with "nine"
FILES
/lib/locale/locale/LC_COLLATE
LC_COLLATE database for locale
/usr/lib/locale/C/colltbl_C
input file used to construct LC_COLLATE in
the default locale.
SEE ALSO
memory(3C), setlocale(3C), strcoll(3C), string(3C),
Page 4 Printed 11/19/92
COLLTBL(1) RISC/os Reference Manual COLLTBL(1)
strxfrm(3C), environ(5) in the Programmer's Reference
Manual.
Printed 11/19/92 Page 5