Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ colldef(8) — SunOS 4.1.4

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

memory(3)

strcoll(3)

string(3)

COLLDEF(8)  —  MAINTENANCE COMMANDS

NAME

colldef − convert collation sequence source definition

SYNOPSIS

/usr/etc/colldef filename

DESCRIPTION

colldef converts a collation sequence source definition into a format usable by the strxfrm() and strcoll(3) functions.  It is used to define the many ways in which strings can be ordered and collated.  strxfrm() transforms its first argument and places the result in its second argument.  The transformed string is such that it can be correctly ordered with other transformed strings by using strcmp(), strncmp(), or memcmp() (see string(3) and memory(3)).  strcoll(3) transforms its arguments and does a comparison. 

colldef reads the collation sequence source definition from the standard input and stores the converted definition in filename. The output file produced contains the database with collating sequence information in a form usable by system commands and routines.

The collation sequence definition specifies a set of collating elements and the rules defining how strings containing these should be ordered.  This is most useful for different language definitions. 

The colldef command can support languages whose mapping and collating sequences can be described by the following cases:

•Ordering of single characters within the codeset.  For example, in Swedish, V is sorted after U, before X and with W (V and W are considered identical as far as sorting is concerned). 

•Equivalence class definition.  A collection of characters is defined to have the same primary sorting value. 

•Ordering of "double characters" in the collation sequence.  For example, in Spanish, ch and ll are collated after c and l, respectively. 

•Ordering of a single character as if it consists of two characters.  For example, in German, the "sharp s", β, is sorted as ss.  This is a special instance of the next case below. 

•Substitution of one character with a character string, that is, a one-to-many mapping.  In the example above, the character β is replaced with ss during sorting. 

•Ignoring certain characters in the codeset during collation.  For example, if ‘−’ is not specified in the specification table, then the strings re−locate and relocate are equal. 

•Null character mapping.  A character is mapped to a null collating element, and is ignored in sorting sequences. 

•Secondary ordering between characters.  In the case where two characters are sorted together in the collation sequence, (for example, they have the same "primary" ordering), there is sometimes a secondary ordering that is used if two strings are identical except for characters that have the same primary ordering.  For example, in French, the letters e and e
`
have the same primary ordering but e comes before e
`
in the secondary ordering.  Thus the word lever would be ordered before le
`
ver, but le
`
ver would be sorted before levitate.  Note: if e came before e
`
in the primary ordering, then le
`
ver would be sorted after levitate. 

USAGE

The specification file can consist of three statements: charmap, substitute, and order.  Of these, only the order statement is required.  When charmap or substitute is supplied, these statements must be ordered as above.  Any statements after the order statement are ignored. 

Lines in the specification file beginning with a # are treated as comments and are ignored.  Blank lines are also ignored. 

charmap charmapfile
charmap defines where a mapping of the character and collating element symbols to the actual character encoding can be found.  The charmapfile filename cannot be a keyword (for example, substitute, order, or with) or special symbols (for example, ..., ;, <, >, or ,). 

The format of charmapfile is shown below.  Symbol names are separated from their values by TAB or SPACE characters.  symbol-value can be specified in a hexadecimal (\x??) or octal (\???) representation, and can be only one character in length. 

symbol-name1symbol-value1
symbol-name2symbol-value2
...

The following sample charmapfile maps the symbol names, c, h, H, and A-grave, to their respective symbol values. 

c\x63
h \x68
H\110
A-grave\300

The symbol names defined in charmapfile can be used in order statements by enclosing the symbol name in angle brackets, <symbol-name>.  For example,

order(a, <A-grave>);b;<c>;...;<h>;<H>;i;...;z

This statement is equivalent to,

order(a, As0`);b;c;...;h;H;i;...;z

Symbol names cannot be specified in substitute fields.  Symbol names also cannot be combined with any other representation, such as, <c>h, c<h>, <c>\x68, or <c><h>.  Symbol names can be used with primary and secondary ordering as in the following example. 

order  a;b;c;(<c>,<h>);d;...;z;\
A;...;G;{H,<H>};I;...;Z

The charmap statement is optional. 

substitute char with repl

The substitute statement substitutes the character char with the string repl.

The simple use of the substitute statement mentioned above substituted a single character with two characters, as with the substitution of β with ss in German. 

substitute "β" with "ss"

This statement can also be used to specify characters to be ignored by mapping them to the null string. 

substitute "m" with ""

This is convenient for simplifying order statements.  When used with the statement below, the lower-case m is ignored — even though it is implicitly included in the order statement. 

order a;...;z

Without the null string mapping statement above, this would be specified as,

order a;...;l;n;...;z

The substitute statement is optional. 

order order_list

order_list is a list of symbols, separated by semicolons, that defines the collating sequence.  The special symbol, ..., specifies, in a short-hand form, symbols that are sequential in machine code order.  The following example specifies the list of lower-case letters. 

order a;b;c;d;...;x;y;z

Of course, this could be further compressed to just a;...;z. 

A symbol can be up to two characters in length and can be represented in any one of the following ways:

•The symbol itself (for example, a for the lower-case letter a). 

•In octal representation (for example, \141 for the letter a). 

•In hexadecimal representation (for example, \x61 for the letter a). 

•The symbol name as defined in the charmap file. 

Any combination of these may be used as well. 

The backslash character, \, is used for continuation.  In this case, no characters are permitted after the backslash character. 

Symbols enclosed in parentheses are assigned the same primary ordering but different secondary ordering.  Symbols enclosed in curly brackets are assigned only the same primary ordering.  For example,

order a;b;c;ch;d;(e,e
`
);f;...;z;\
{1,2,3,4,5,6,7,8,9};A;...;Z

In the above example, e and e
`
are assigned the same primary ordering and different secondary ordering, and digits 1 through 9 are assigned the same primary ordering and no secondary ordering.  Note that the ellipses cannot be specified within curly brackets.  Only primary ordering is assigned to the remaining symbols.  Notice how double letters can be specified in the collating sequence (letter ch comes between c and d). 

If a character is not included in the order statement it is excluded from the ordering and will be ignored during sorting. 

EXAMPLES

The following example shows the collation specification required to support a hypothetical telephone book sorting sequence. 

The sorting sequence is defined by the following rules:

•Upper and lower case letters must be sorted together, but upper case letters have precedence over lower case letters. 

•All special characters and punctuation should be ignored. 

•Digits must be sorted as their alphabetic counterparts (for example, 0 as zero, 1 as one). 

•The CH, Ch, ch combinations must be collated between c and D. 

•V and W, v and w must be collated together. 

The input specification file for this example contains:

substitute "0" with "zero"
substitute "1" with "one"
substitute "2" with "two"
substitute "3" with "three"
substitute "4" with "four"
substitute "5" with "five"
substitute "6" with "six"
substitute "7" with "seven"
substitute "8" with "eight"
substitute "9" with "nine"
 order A;a;B;b;C;c;CH;Ch;ch;D;d;E;e;F;f;\
      G;g;H;h:I;i;J;j;K;k;L;l;M;m;N;n;O;o;P;p;\
      Q;q;R;r;S;s;T;t;U;u;{V,W};{v,w};X;x;Y;y;Z;z

EXIT STATUS

colldef exits with the following values:

0 No errors were found and the output was successfully created. 

>0 Errors were found. 

FILES

/etc/locale/LC_COLLATE/locale
standard private location for collation orders under the locale locale

/usr/share/lib/locale/LC_COLLATE/locale
standard shared location for collation orders under the locale locale

SEE ALSO

memory(3), strcoll(3), string(3)

System Services Overview
 
 

Sun Release 4.1  —  Last change: 30 May 1991

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026