KBDCOMP(1M) RISC/os Reference Manual KBDCOMP(1M)
NAME
kbdcomp - compile kbd tables
SYNOPSIS
kbdcomp [-vrR] [-o outfile ] [ infile ]
DESCRIPTION
kbdcomp compiles tables for use with the kbdstrm(7) STREAMS
module, a programmable string-translation module. The module
has two separate abilities, each of which may be used alone
or in combination.
The first ability, lookup, is that of performing simple sub-
stitution of bytes in an input Stream. This ability is
based on a simple 256-entry lookup table (as there are 256
possible bit combinations for a byte). As input is received,
each byte is looked up in the translation table, and the
table value for that byte is substituted in place of the
original byte. The process is quick, and can be performed
on each STREAMS message with no message copying or duplica-
tion.
The second ability, mapping, allows searching for
occurrences of specified strings of bytes (or individual
bytes) in an input Stream, and substituting other strings
(or bytes) for them as they are recognized. There are three
kinds of mapping that are differentiated by the relationship
between the number of bytes in the input and the number of
bytes in the output. one-many mapping means that for a given
byte in the input, many bytes are substituted. many-one map-
ping means that for many bytes in the input one byte is sub-
stituted. many-many mapping includes the other two types as
a proper subset, but also includes substitution of many
bytes in the input with many bytes of output. kbdstrm can
perform all three types of mapping. The lookup ability
described in the previous paragraph (i.e., what amounts to
one-one mapping) is a common special case useful enough to
be included separately. By using combinations of both lookup
and mapping, a larger class of input translation and conver-
sion problems can be solved than can be solved by the use of
either alone.
During operation, processing occurs in two major passes: the
lookup table pass always precedes string mapping. The
string mapping procedure is nonrecursive for a given table
and there is no feedback mechanism (that is, input is
scanned in order as received and output is not re-scanned
for occurrences of recognizable input strings). As an exam-
ple of mapping, suppose one wishes to translate all
occurrences of the string this in an input Stream into the
string there. The module recognizes and buffers occurrences
of the string th (as each byte is received); if the
Printed 11/19/92 Page 1
KBDCOMP(1M) RISC/os Reference Manual KBDCOMP(1M)
following character is i, it will also be buffered, but if x
is then received, a mismatch is recognized and no transla-
tion occurs. Assuming thi has been buffered, if the next
character seen is s, a match is recognized, the buffer con-
taining this is discarded, and the string there replaces it.
It should be obvious that both input and output strings can
be of any non-zero length (see however, the section below on
limitations). Each string to be recognized and translated
must be unique, and no complete input string may constitute
the leading substring of any other (e.g., one may not define
abc and ab simultaneously, but may so define abc, abd, and
abxy).
Given a filename (or stdin if no name is supplied), kbdcomp
will compile tables into the output file specified by the -o
option. If the -o option is not supplied, output is to the
file kbd.out.
The -v option causes parsing and verification - no output
file is produced; if no error messages are printed, then the
input file is syntactically correct. The -r option causes
the compiler to check for and report on byte values that
cannot be generated in a table (see the description below).
The option -R is equivalent to -r but it tries to print
printable characters as themselves rather than in octal for-
mat.
Input Language
Source files for kbdcomp are a series of table declarations.
Within each table declaration are a number of definitions
and functions. A table declaration is one of the forms map
or link:
map type ( name ) { expressions }
link ( string )
The link form will be described later below. The name of a
map must be a simple token not containing any colons, com-
mas, quotes, or spaces. (For our purposes, a simple token is
a sequence of alphabetic and/or numeric characters with no
embedded punctuation, whitespace, or special symbols.) The
type field is an optional field that may be either of the
keywords full or sparse. If omitted, the type defaults to
sparse. The effect of this field is described in more
detail below. The expressions contained in the map declara-
tion are one of the following forms. Reserved keywords are
printed in constant-width font, variables in italics:
keylist ( string string )
Page 2 Printed 11/19/92
KBDCOMP(1M) RISC/os Reference Manual KBDCOMP(1M)
define ( word value )
word ( extension result )
string ( word word )
strlist ( string string )
error ( string )
timed
The keylist form is for defining lookup table entries while
the remaining forms are the separate string functions.
The definition form (define) allows a mnemonic word (the
first argument) to be associated with a string (the second
argument). It is useful for replacing complicated sequences
(e.g., those containing special symbols or control charac-
ters) with mnemonic words to facilitate the design and rea-
dability of tables.
Using the word form (where word must be a previously defined
sequence) in a manner similar to a C function call results
in the value of word being concatenated with extension; when
the combination is recognized, it is mapped to result. The
value may be a string of characters or a single byte. The
following is an illustration (not intended to be complete):
map (some_accents) {
define(acute '\047')
define(grave '`' )
acute(a '\341') # same as string("\047a" "\341"
grave(a '\340')
# ...et cetera ...
keylist("zyZY" "yzYZ")
}
This map defines the single quote and reverse quote keys as
dead-keys, which when followed by a produce a character from
the ISO 8859-1 codeset. It is not necessary for the defini-
tion, extension, or result to be a single byte; they may be
arbitrary strings.
Strings in definitions and arguments may generally be
entered either without quotation or between double quotes.
Byte constants may likewise be entered unquoted or between
single quotes. The only time quotation is strictly required
is when the string contains parentheses, spaces, tab charac-
ters, or other special symbols. The language makes no real
distinction between byte constants and string constants:
both are treated as null-terminated strings; the choice of
whether to use a one-character string or a byte constant is
thus a matter of taste. Most quoting conventions of C are
recognized, except that octal constants must be exactly
three digits long. Octal constants may be used in strings as
Printed 11/19/92 Page 3
KBDCOMP(1M) RISC/os Reference Manual KBDCOMP(1M)
well. In the example above, the arguments to keylist need
not be quoted, as they contain no special symbols. The fol-
lowing example illustrates some situations where strings
must be quoted:
string(abc "two words") # literal space
keylist("[{}]" "(())") # brackets/parenthesis
define(esc_seq "\033\t(") # tab and parenthesis
define(space ' ') # literal space
string(abc "keylist") # keyword used as argument
Comments in files (inside or outside of map declarations)
may be entered in the same manner as for sh(1); that is,
after a # at the end of a line, or on a line beginning with
#, as shown in the above examples. The keylist form allows
single bytes to be mapped to other single bytes; it defines
actions that are treated in the lookup table (i.e., are per-
formed before mapping). Any byte value that is not expli-
citly changed by being included in a keylist form will, of
course, be left unchanged; if no keylist forms appear in a
map definition, then kbdcomp does not generate a lookup
table for the map, and the lookup phase is skipped during
module operation. Each byte in the first string argument to
keylist is mapped to the byte at the same position in the
second string argument. That is, given two strings X and Y
as arguments: X maps to Y ; X maps to Y and so forth. The
two arguments m1
ust evaluat1
e tojstrings coj
ntaining the same
number of bytes.
The string form has a function similar to mnemonic forms
defined with define and may be used for any type of many-
many mapping. The first argument to string is mapped to the
second argument (see the comment in the sample map above).
Mappings using both keylist and string or any define forms
may be combined: if i is mapped to a with a keylist form,
and a is used in the sequence ` a, then when the user types
` i, the sequence ` a is seen by the string mapping process
(because lookup is done first) and translated accordingly.
The keylist form is intended mainly for use in simple key-
board rearrangement and case-conversion applications; string
is for one-many mapping or for isolated instances of many-
many mapping; the define form and words defined with it are
intended for more general use in groups of related
sequences. In some situations while a one-one mapping with
keylist may be an obvious choice, the same effect may be
achieved with string forms to avoid having a contradictory
mapping. For example, suppose one desires, simultaneously,
to translate x into y and y into abc. If x is mapped to y
via a keylist form and y is mapped to abc via a string form,
then it may beimpossible to obtain y itself (unless defined
Page 4 Printed 11/19/92
KBDCOMP(1M) RISC/os Reference Manual KBDCOMP(1M)
in another sequence), even though that was not the intention
- the intention was to obtain y whenever the user enters x.
This is a contradictory mapping:
keylist(x y)
string(y abc) # "y" itself cannot be generated
There are cases where the intention is that y not be gen-
erated, but most often the intention is to generate it.
This problem (a relatively common one in codeset mapping)
can be "solved" by using a string form to map x to y ini-
tially rather than using a keylist form. This allows both y
and abc to be generated:
string(x y)
string(y abc)
Entering a large number of one-one mappings with string can
be somewhat tedious. To make things easier, the strlist form
is provided. The two string arguments to strlist are inter-
preted in the same manner as arguments to keylist (i.e.,
they are one-one mappings), except that they are not done by
the lookup table, but are processed as string mappings. In
the following example, the first three strlng definitions
can be reduced to the strlist form which follows:
string(a b)
string(c d)
string(e f)
strlist(ace bdf)
It is important to recognize the difference between string
and strlist: with string, the two arguments are a single
mapping definition (which may be of any type) whereas with
strlist, one or more one-one string mappings are defined
simultaneously. A set of mappings deined with a combination
of string and strlist do not exhibit the same type of incom-
patibility described above between keylist and string.
Some further aspects of module processing can now be
presented. When a partial match in an input sequence is
detected during string processing, it is buffered; if at
some point the match no longer succeeds, the first byte of
the matched buffer is normally sent to the neighboring
module. The rest of the input is left in the buffer and
scanned again to see if it matches the beginning of another
sequence. The error entry allows one to send a string (or
byte) constant (called a fallback character) instead of the
Printed 11/19/92 Page 5
KBDCOMP(1M) RISC/os Reference Manual KBDCOMP(1M)
byte that began the previous sequence; this is particularly
useful in codeset mapping and conversion applications where
the character which failed to be translated might be one
which does not occur or has some other meaning in the target
codeset. The following (somewhat contrived) example illus-
trates use of the error form:
# turn arrow keys into vi commands
map (vi_map) {
string("\033[A" k) # up
string("\033[B" j) # down
error("!")
}
Given input of the escape character followed by [A or [B, a
single character (j or k) is generated. If presented with
the sequence escape-[Q, the module will produce the sequence
![Q. The error string ! replaces escape because the sequence
failed to match when Q was received. The remaining charac-
ters are re-scanned, and neither [ nor Q is found to begin a
recognized sequence.
One-one mapping with strings or other defined forms (rather
than via a keylist lookup table) is generally performed with
a linear search operation when looking for bytes which begin
sequences. However, if the table is specified as a full
table, it is initially indexed rather than searched
linearly, and thus processed much more quickly when there
are a large number of entries. This should be kept in mind
in codeset mapping applications where nearly all characters
are mapped, and many (or most) are one-one mappings. If only
a very few characters are mapped with string functions, one
must decide on whether to trade a small gain in processing
speed for the space needed to store the index if a table is
made full.
The link form is used to produce a composite table. A com-
posite table is really a form of linkage that allows several
tables to be used together in sequence as if the sequence
were a single table. The string argument to link is of the
following form:
composite:component1,component2,componentn
The target composite name is followed by a colon, and the
ordered component list is comma-separated. If the string
argument contains spaces or special characters, it must be
quoted. (This string is not interpreted by kbdcomp, but is
left intact in the output file; it is interpreted by the
module at run time.) When a composite table is used, the
effect is similar to pushing more than one instance of the
Page 6 Printed 11/19/92
KBDCOMP(1M) RISC/os Reference Manual KBDCOMP(1M)
kbdstrm(7) module in the sense that the component tables
function sequentially but it is accomplished within a single
instance of the module. As output is produced by processing
with one table in the composite, the data is subsequently
processed by the next component and so forth until the final
result emerges at the end of the sequence. (There is no res-
triction on the use of any combination of full and sparse
tables in a composite.)
Composite tables are useful for simplifying complex mapping
situations by modularizing the processing and for increasing
the re-usability of tables or different mapping applica-
tions. Tables primarily implementing codeset mappings may be
linked to other tables primarily implementing compose- or
dead-key sequences. With a single table implementing a com-
mon codeset mapping, several different tables implementing
combinations of codeset mapping and compose-key layouts may
be built. A typical coniguration might use one table or map-
ping from an external to internal codeset, then use one or
more separate tables working in the internal codeset to pro-
vide compose- or dead-key functionality, as in the following
example. One table, 646Sp-8859, maps from an ISO 646 variant
(Spanish) external codeset to ISO 8859-1; this is combined
with two other tables respectively implementing 8859-1 by
compose sequences, and by dead-key sequences:
link("composed:646Sp-8859,8859-1-cmp")
link("deadkey:646Sp-8859,8859-1-dk")
Composite tables can also be built while the module is run-
ning from the kbdload(1M) command line; details are in the
kbdload(1M) manual page. The component tables are linked and
processed in the given order (left-to-right). Because the
link argument is actually parsed at runtime (by kbdstrm(7)),
it is not an error to refer to tables that are not contained
in the file currently being compiled. An error will be gen-
erated when the file is loaded if any component of a link is
not present in memory at that time.
The directive timed may appear any place within a map
declaration. If used, it causes the table within which it is
defined to be interpreted in timeout mode. In this mode,
string mappings are considered to not match if more than a
certain amount of time elapses after receipt of the first
byte of a sequence without its being being fully received
and mapped. Given a timed map in which abc is to be mapped
to xyz and the timeout value is 30, if the user types ab,
then waits for longer than 30 time units before typing c,
the entire sequence will not be translated. In this case the
sequence is treated as any other mismatch would be: a is
passed to the neighboring module, and b is checked to see if
Printed 11/19/92 Page 7
KBDCOMP(1M) RISC/os Reference Manual KBDCOMP(1M)
it begins a sequence. The timer is reset when a mismatch
occurs, so that if bc is defined in this situation and c has
just been received, it will be mapped as expected. The
default timeout is typically 1/5 to 1/3 of a second (see
kbdstrm(7) for details).
Timeout mode is generally useful in situations where termi-
nal function keys are being interpreted, to distinguish
between a string typed by the user and a function key string
sent by the terminal; it is not intended for use with
"batch" applications such as kbdpipe(1M). In a composite
table, some components may be timed and some not, making the
mode useful for combinations of codeset mapping and function
key mapping.
Timing depends on several factors, including terminal baud-
rate, system load, and the user's typing speed. If the
timeout value is too long, then typed sequences that happen
to be the same as function keys will be erroneously mapped;
if the value is too short, then function keys may be missed
under a heavy system load or with low speed devices. See
kbdset(1) for information on how to change the timeout
value, and kbdstrm(7) for information on how an administra-
tor may change the default timeout value. This directive
should never be used in tables that implement codeset map-
ping, as it makes the results quite unpredictable. Long
timeouts, on the order of seconds, may be useful in some
contexts.
Building and Debugging
Users who intend to build their own tables may study the
source tables supplied with the distribution in the files
/usr/lib/kbd/*.map.
If characters other than alpha-numerics are to be used,
quoted strings are preferred to unquoted strings; quotation
is required for some characters, as mentioned above. Map
names and the first arguments of define should be alpha-
numeric tokens.
The report generated by the -r option may be useful for
debugging complex tables. The report (produced on stderr)
consists of two octal lists. One list contains byte values
that cannot be generated from the lookup table (if keylist
forms are used). The other list contains byte values that
cannot be generated in any way; in other words, values that
are neither parts of ``result text'' (i.e., products of
string mappings) nor generated by the lookup table (if there
is one), but that are used in other sequences. The report
does not exhaustively list unreachable paths, but may indi-
cate whether they exist and help pinpoint them.
Page 8 Printed 11/19/92
KBDCOMP(1M) RISC/os Reference Manual KBDCOMP(1M)
Output Files
The files produced by kbdcomp begin with a header. The magic
string is kbd!map, with a version number. This header is
immediately followed by the tables themselves. (A file can
contain more than one table.) The lines below can be added
to the /etc/magic file for the file(1) command to recognize
kbdstrm files.
0 string kbd!map kbd map file
>8 byte >0 Ver %d:
>10 short >0 with %d table(s)
LIMITATIONS
A maximum length of 128 bytes for input strings and 256
bytes for output strings is imposed. The total amount of
space consumed by a single table is limited to around 65,000
bytes. Versions are strictly incompatible; "object" tables
are machine-dependent in their byte order and structure
size. Thus, while source files are portable, the output of
kbdcomp is not. This implies that when using remote devices
across a network between heterogeneous machines, tables must
be loaded on the machine where the module is actually pushed
(i.e., the remote site).
FILES
/usr/lib/kbd directory containing system stan-
dard map files.
/usr/lib/kbd/*.map sources for kbd files.
SEE ALSO
kbdload(1M), kbdset(1), kbdstrm(7)
Printed 11/19/92 Page 9