kbdcomp(1M) 04 Jun 1992 kbdcomp(1M)
NAME
kbdcomp - compile kbd tables
SYNOPSIS
kbdcomp [-vrR] [-o outfile] [infile]
DESCRIPTION
kbdcomp compiles tables for use with the kbd STREAMS module, a pro-
grammable string-translation module. The module has two separate abil-
ities, each of which may be used alone or in combination.
The first ability, lookup, is that of performing simple substitution
of bytes in an input stream. This ability is based on a simple 256-
entry lookup table (as there are 256 possible bit combinations for a
byte). As input is received, each byte is looked up in the translation
table, and the table value for that byte is substituted in place of
the original byte. The process is quick, and can be performed on each
STREAMS message with no message copying or duplication.
The second ability, mapping, allows searching for occurrences of
specified strings of bytes (or individual bytes) in an input stream,
and substituting other strings (or bytes) for them as they are recog-
nized. There are three kinds of mapping that are differentiated by the
relationship between the number of bytes in the input and the number
of bytes in the output. One-many mapping means that for a given byte
in the input, many bytes are substituted. Many-one mapping means that
for many bytes in the input one byte is substituted. Many-many map-
ping includes the other two types as a proper subset, but also
includes substitution of many bytes in the input with many bytes of
output. KBD can perform all three types of mapping. The lookup ability
described in the previous paragraph (that is, what amounts to one-one
mapping) is a common special case useful enough to be included
separately. By using combinations of both lookup and mapping, a
larger class of input translation and conversion problems can be
solved than can be solved by the use of either alone.
During operation, processing occurs in two major passes: the lookup
table pass always precedes string mapping. The string mapping pro-
cedure is non-recursive for a given table and there is no feedback
mechanism (that is, input is scanned in order as received and output
is not re-scanned for occurrences of recognizable input strings). As
an example of mapping, suppose one wishes to translate all occurrences
of the string this in an input stream into the string there. The
module recognizes and buffers occurrences of the string th (as each
byte is received); if the following character is i, it will also be
buffered, but if x is then received, a mismatch is recognized and no
translation occurs. Assuming thi has been buffered, if the next char-
acter seen is s, a match is recognized, the buffer containing this is
discarded, and the string there replaces it.
It should be obvious that both input and output strings can be of any
non-zero length (see however, the section below on limitations). Each
Page 1 Reliant UNIX 5.44 6, 194
kbdcomp(1M) 04 Jun 1992 kbdcomp(1M)
string to be recognized and translated must be unique, and no complete
input string may constitute the leading substring of any other (for
example, one may not define abc and ab simultaneously, but may so
define abc, abd, and abxy).
Given a filename (or standard input if no name is supplied), kbdcomp
will compile tables into the output file specified by the -o option.
If the -o option is not supplied, output is to the file kbd.out.
The -v option causes parsing and verification-no output file is pro-
duced; if no error messages are printed, then the input file is syn-
tactically correct. The -r option causes the compiler to check for and
report on byte values that cannot be generated in a table (see the
description below). The option -R is equivalent to -r but it tries to
print printable characters as themselves rather than in octal format.
Input Language
Source files for kbdcomp are a series of table declarations. Within
each table declaration are a number of definitions and functions. A
table declaration is one of the forms map, link, or extern:
map type ( name ) { expressions }
link ( string )
extern ( string )
The link and extern forms are described below. The name of a map must
be a simple token not containing any colons, commas, quotes, or
spaces. (For our purposes, a simple token is a sequence of alphabetic
and/or numeric characters with no embedded punctuation, white space,
or special symbols.) The type field is an optional field that may be
either of the keywords full or sparse. If omitted, the type defaults
to sparse. The effect of this field is described in more detail
below. The expressions contained in the map declaration have one of
the following forms (reserved keywords are printed in constant-width
font, variables in italics):
keylist ( string string )
define ( word value )
word ( extension result )
string ( word word )
strlist ( string string )
error ( string )
timed
The keylist form is for defining lookup table entries while the
remaining forms are the separate string functions.
The definition form (define) allows a mnemonic word (the first argu-
ment) to be associated with a string (the second argument). It is
useful for replacing complicated sequences (for example, those con-
taining special symbols or control characters) with mnemonic words to
facilitate the design and readability of tables.
Using the word form (where word must be a previously defined sequence)
in a manner similar to a C function call results in the value of word
Page 2 Reliant UNIX 5.44 6, 194
kbdcomp(1M) 04 Jun 1992 kbdcomp(1M)
being concatenated with extension; when the combination is recognized
at runtime, it is mapped to result. The value may be a string of
characters or a single byte. The following is an illustration (not
intended to be complete):
map (some_accents) {
define(acute '\047')
define(grave '`' )
acute(a '\341') # same as string("\047a" "\341")
grave(a '\340')
# ...et cetera...
keylist("zyZY" "yzYZ")
}
This map (above) defines the single quote and reverse quote keys as
dead-keys, which when followed by a produce a character from the ISO
8859-1 codeset. It is not necessary for the definition, extension, or
result to be a single byte; they may be arbitrary strings.
Strings in definitions and arguments may generally be entered either
without quotation or between double quotes. Byte constants may like-
wise be entered unquoted or between single quotes. The only time quo-
tation is strictly required is when the string contains parentheses,
spaces, tab characters, or other special symbols. The language makes
no real distinction between byte constants and string constants: both
are treated as null-terminated strings; the choice of whether to use a
one-character string or a byte constant is thus a matter of taste.
Most quoting conventions of C are recognized, except that octal con-
stants must be exactly three digits long. Octal constants may be used
in strings as well. In the example above, the arguments to keylist
need not be quoted, as they contain no special symbols. The following
example illustrates some situations where strings must be quoted:
string(abc "two words") # literal space
keylist("[{}]" "(())") # brackets/parentheses
define(esc_seq "\033\t(") # tab and parenthesis
define(space ' ') # literal space
string(abc "keylist") # keyword used as argument
Comments in files (inside or outside of map declarations) may be
entered in the same manner as for sh(1); that is, after a # at the end
of a line, or on a line beginning with #, as shown in the above exam-
ples.
The keylist form allows single bytes to be mapped to other single
bytes; it defines actions that are treated in the lookup table (that
is, are performed before mapping). Any byte value that is not expli-
citly changed by being included in a keylist form will, of course, be
left unchanged; if no keylist forms appear in a map definition, then
kbdcomp does not generate a lookup table for the map, and the lookup
phase is skipped during module operation. Each byte in the first
string argument to keylist is mapped to the byte at the same position
in the second string argument. That is, given two strings X and Y as
arguments: Xi maps to Yi, Xj maps to Yj and so forth. The two
Page 3 Reliant UNIX 5.44 6, 194
kbdcomp(1M) 04 Jun 1992 kbdcomp(1M)
arguments must, after evaluation, be found to contain the same number
of bytes.
The string form has a function similar to mnemonic forms defined with
define and may be used for any type of many-many mapping. The first
argument to string is mapped to the second argument (see the comment
in the sample map above).
Mappings using both keylist and string or any define forms may be com-
bined: if i is mapped to a with a keylist form, and a is used in the
sequence `a, then when the user types `i, the sequence `a is seen by
the string mapping process (because lookup is done first) and
translated accordingly.
The keylist form is intended mainly for use in simple keyboard re-
arrangement and case-conversion applications; string is for one-many
mapping or for isolated instances of many-many mapping; the define
form and words defined with it are intended for more general use in
groups of related sequences. In some situations while a one-one map-
ping with keylist may be an obvious choice, the same effect may be
achieved with string forms to avoid having a contradictory mapping.
For example, suppose one desires, simultaneously, to translate x into
y and y into abc. If x is mapped to y via a keylist form and y is
mapped to abc via a string form, then it may be impossible to obtain y
itself (unless defined in another sequence), even though that was not
the intention-the intention was to obtain y whenever the user enters
x. This is a contradictory mapping:
keylist(x y)
string(y abc) # "y" itself cannot be generated
There are cases where the intention is that y not be generated, but
most often the intention is to generate it. This problem (a relatively
common one in codeset mapping) can be ``solved'' by using a string
form to map x to y initially rather than using a keylist form. This
allows both y and abc to be generated:
string(x y)
string(y abc)
Entering a large number of one-one mappings with string can be some-
what tedious. To make things easier, the strlist form is provided.
The two string arguments to strlist are interpreted in the same manner
as arguments to keylist, (that is, they are one-one mappings) except
that they are not done by the lookup table, but are processed as
string mappings. In the following example, the first three string
definitions can be reduced to the strlist form which follows:
string(a b)
string(c d)
string(e f)
strlist(ace bdf)
It is important to recognize the difference between string and
strlist: with string, the two arguments are a single mapping
Page 4 Reliant UNIX 5.44 6, 194
kbdcomp(1M) 04 Jun 1992 kbdcomp(1M)
definition (which may be of any type) whereas with strlist, one or
more one-one string mappings are defined simultaneously. A set of map-
pings defined with a combination of string and strlist do not exhibit
the same type of incompatibility described above for keylist and
string.
Some further aspects of module processing can now be presented. When a
partial match in an input sequence is detected during string process-
ing, it is buffered; if at some point the match no longer succeeds,
the first byte of the matched buffer is normally sent to the neighbor-
ing module. The rest of the input is left in the buffer and scanned
again to see if it matches the beginning of another sequence. The
error entry allows one to send a string (or byte) constant (called a
fallback character) instead of the byte that began the previous
sequence; this is particularly useful in codeset mapping and conver-
sion applications where the character which failed to be translated
might be one which does not occur or has some other meaning in the
target codeset. The following (somewhat contrived) example illus-
trates use of the error form:
# turn arrow keys into vi commands
map (vi_map) {
string("\033[A" k) # up
string("\033[B" j) # down
error("!")
}
Given input of the escape character followed by [A or [B, a single
character (j or k) is generated. If presented with the sequence
escape-[Q, the module will produce the sequence ![Q. The error string
! replaces escape because the sequence failed to match when Q was
received. The remaining characters are re-scanned, and neither [ nor
Q is found to begin a recognized sequence.
One-one mapping with strings or other defined forms (rather than via a
keylist lookup table) is generally performed with a linear search
operation when looking for bytes which begin sequences. However, if
the table is specified as a full table, it is initially indexed rather
than searched linearly, and thus processed much more quickly when
there are a large number of entries. This should be kept in mind in
codeset mapping applications where nearly all characters are mapped,
and many (or most) are one-one mappings. If only a very few charac-
ters are mapped with string functions, one must decide on whether to
trade a small gain in processing speed for the space needed to store
the index if a table is made full.
The link form is used to produce a composite table. A composite table
is really a form of linkage that allows several tables to be used
together in sequence as if the sequence were a single table. The
string argument to link is of the following form:
composite:component1,component2,componentn
The target composite name is followed by a colon, and the ordered
Page 5 Reliant UNIX 5.44 6, 194
kbdcomp(1M) 04 Jun 1992 kbdcomp(1M)
component list is comma-separated. If the string argument contains
spaces or special characters, it must be quoted. (This string is not
interpreted by kbdcomp, but is left intact in the output file; it is
interpreted by the module at runtime.) When a composite table is used,
the effect is similar to pushing more than one instance of the kbd
module in the sense that the component tables function sequentially
but it is accomplished within a single instance of the module. As out-
put is produced by processing with one table in the composite, the
data is subsequently processed by the next component and so forth
until the final result emerges at the end of the sequence. (There is
no restriction on the use of any combination of full and sparse tables
in a composite.)
Composite tables are useful for simplifying complex mapping situations
by modularizing the processing and for increasing the re-usability of
tables for different mapping applications. Tables primarily implement-
ing codeset mappings may be linked to other tables primarily imple-
menting compose- or dead-key sequences. With a single table imple-
menting a common codeset mapping, several different tables implement-
ing combinations of codeset mapping and compose-key layouts may be
built. A typical configuration might use one table for mapping from
an external to internal codeset, then use one or more separate tables
working in the internal codeset to provide compose- or dead-key func-
tionality, as in the following example. One table, 646Sp-8859 maps
from an ISO 646 variant (Spanish) external codeset to ISO 8859-1; this
is combined with two other tables respectively implementing ISO 8859-1
by compose-sequences, and by dead-key sequences:
link("composed:646Sp-8859,8859-1-cmp")
link("deadkey:646Sp-8859,8859-1-dk")
Composite tables can also be built while the module is running from
the kbdload command line [see kbdload(1M) for details]. The component
tables are linked and processed in the given order (left-to-right).
Because the link argument is actually parsed at run-time by the kbd
module, it is not an error to refer to tables that are not contained
in the file currently being compiled, though it is an error to specify
a composite table composed of less than two tables. An error will be
generated when the file is loaded if any component of a link is not
present in memory at that time.
The extern form can be used to declare an external function managed by
the alp module. External functions are managed in a list by that
module, and are available for use as if they were simple tables in
kbd. External functions are not downloaded, they are resident in the
kernel and merely accessed by the kbd module [see alp(7) for more
information]. Such functions can also be declared dynamically when
needed [see kbdload(1M)].
The directive timed may appear any place within a map declaration. If
used, it causes the table in which it appears to be interpreted in
timeout mode. In this mode, string mappings do not match if more than
a specified amount of time elapses after receipt of the first byte of
Page 6 Reliant UNIX 5.44 6, 194
kbdcomp(1M) 04 Jun 1992 kbdcomp(1M)
a sequence and before it is fully received and mapped. Given a timed
map in which abc is to be mapped to xyz and the timeout value is 30,
if the user types ab, then waits for longer than 30 time units before
typing c, the entire sequence will not be translated. In this case the
sequence is treated as any other mismatch would be: a is passed to the
neighboring module, and b is checked to see if it begins a sequence.
The timer is reset when a mismatch occurs, so that if bc is defined in
this situation and c has just been received, it will be mapped as
expected. The default timeout is typically 1/5 to 1/3 of a second
[see kbd(7) for details].
Timeout mode is generally useful in situations where terminal function
keys are being interpreted, to distinguish between a string typed by
the user and a function key string sent by the terminal; it is not
intended for use with ``batch'' applications such as the iconv com-
mand, nor generally in pipelines [see pipe(2)]. In a composite table,
some components may be timed and some not, making the mode useful for
combinations of codeset mapping and function key mapping.
Timing depends on several factors, including terminal baud-rate, sys-
tem load, and the user's typing speed. If the timeout value is too
long, then typed sequences that happen to be the same as function keys
will be erroneously mapped; if the value is too short, then function
keys may be missed under a heavy system load or with low speed
devices. See kbdset(1) for information on how to change the timeout
value, and kbd(7) for information on how an administrator may change
the default timeout value. This directive should never be used in
tables that implement codeset mapping, as it makes the results quite
unpredictable. Long timeouts, on the order of seconds, may be useful
in some contexts.
Building & Debugging
Users who intend to build their own tables may study the source tables
supplied with the distribution in the directory /usr/lib/kbd.
If characters other than alpha-numerics are to be used, quoted strings
are preferred to unquoted strings; quotation is required for some
characters, as mentioned above. Map names and the first arguments of
define should be alpha-numeric tokens.
The report generated by the -r option may be useful for debugging com-
plex tables. The report (produced on standard error) consists of two
octal lists. One list contains byte values that cannot be generated
from the lookup table (if keylist forms are used). The other list con-
tains byte values that cannot be generated in any way; in other words,
values that are neither parts of ``result text'' (that is, products of
string mappings) nor generated by the lookup table (if there is one),
but that are used in other sequences. The report does not exhaus-
tively list unreachable paths, but may indicate whether they exist and
help pinpoint them.
Page 7 Reliant UNIX 5.44 6, 194
kbdcomp(1M) 04 Jun 1992 kbdcomp(1M)
Output Files
The files produced by kbdcomp begin with a header. The magic string is
kbd!map, with a version number. This header is immediately followed by
the tables themselves. (A file can contain more than one table.) The
lines below can be added to the /etc/magic file for the file command
to recognize kbd files.
0 string kbd!map kbd map file
>8 byte >0 Ver %d:
>10 short >0 with %d table(s)
LIMITATIONS
A maximum length of 128 bytes for input strings and 256 bytes for out-
put strings is imposed. The total amount of space consumed by a sin-
gle table is limited to around 65,000 bytes. Versions are strictly
incompatible; ``object'' tables are machine-dependent in their byte
order and structure size. Thus, while source files are portable, the
output of kbdcomp is not. This implies that when using remote devices
across a network between heterogeneous machines, tables must be loaded
on the machine where the module is actually pushed (that is, the
remote side).
FILES
/usr/lib/kbd directory containing system standard map files
/usr/lib/kbd/*.map source for some system map files
SEE ALSO
iconv(1), kbdset(1), kbdload(1M), alp(7), kbd(7),
Page 8 Reliant UNIX 5.44 6, 194