Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ kbdcomp(1M) — DG/UX R4.11

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

kbdload(1M)

kbdset(1)

iconv(1)

alp(7)

cpz(4M)



kbdcomp(1M)                      DG/UX R4.11                     kbdcomp(1M)


NAME
       kbdcomp - compile att_kbd tables

SYNOPSIS
       kbdcomp [-vrR] [-o outfile] [infile]

DESCRIPTION
       kbdcomp compiles tables for use with the attkbd STREAMS module, a
       programmable string-translation module.  The module has two separate
       abilities, each of which may be used alone or in combination.

       The first ability, lookup, is that of performing simple substitution
       of bytes in an input stream.  This ability is based on a simple
       256-entry lookup table (as there are 256 possible bit combinations
       for a byte).  As input is received, each byte is looked up in the
       translation table, and the table value for that byte is substituted
       in place of the original byte.  The process is quick, and can be
       performed on each STREAMS message with no message copying or
       duplication.

       The second ability, mapping allows searching for occurrences of
       specified strings of bytes (or individual bytes) in an input stream,
       and substituting other strings (or bytes) for them as they are
       recognized.  There are three kinds of mapping that are differentiated
       by the relationship between the number of bytes in the input and the
       number of bytes in the output.  One-many mapping means that for a
       given byte in the input, many bytes are substituted.  Many-one
       mapping means that for many bytes in the input one byte is
       substituted.  Many-many mapping includes the other two types as a
       proper subset, but also includes substitution of many bytes in the
       input with many bytes of output.  attkbd can perform all three types
       of mapping.  The lookup ability described in the previous paragraph
       (that is, what amounts to one-one mapping) is a common special case
       useful enough to be included separately.  By using combinations of
       both lookup and mapping, a larger class of input translation and
       conversion problems can be solved than can be solved by the use of
       either alone.

       During operation, processing occurs in two major passes:  the lookup
       table pass always precedes string mapping.  The string mapping
       procedure is non-recursive for a given table and there is no feedback
       mechanism (that is, input is scanned in order as received and output
       is not re-scanned for occurrences of recognizable input strings).  As
       an example of mapping, suppose one wishes to translate all
       occurrences of the string this in an input stream into the string
       there.  The module recognizes and buffers occurrences of the string
       th (as each byte is received); if the following character is i, it
       will also be buffered, but if x is then received, a mismatch is
       recognized and no translation occurs.  Assuming thi has been
       buffered, if the next character seen is s, a match is recognized, the
       buffer containing this is discarded, and the string there replaces
       it.

       It should be obvious that both input and output strings can be of any
       non-zero length (see however, the section below on limitations).
       Each string to be recognized and translated must be unique, and no
       complete input string may constitute the leading substring of any
       other (for example, one may not define abc and ab simultaneously, but
       may so define abc, abd, and abxy).

       Given a filename (or standard input if no name is supplied), kbdcomp
       will compile tables into the output file specified by the -o option.
       If the -o option is not supplied, output is to the file kbd.out.

       The -v option causes parsing and verification--no output file is
       produced; if no error messages are printed, then the input file is
       syntactically correct.  The -r option causes the compiler to check
       for and report on byte values that cannot be generated in a table
       (see the description below).  The option -R is equivalent to -r but
       it tries to print printable characters as themselves rather than in
       octal format.

   Input Language
       Source files for kbdcomp are a series of table declarations.  Within
       each table declaration are a number of definitions and functions.  A
       table declaration is one of the forms map, link, or extern:
            map type ( name ) { expressions }
            link ( string )
            extern ( string )

       The link and extern forms will be described later below.  The name of
       a map must be a simple token not containing any colons, commas,
       quotes, or spaces.  (For our purposes, a simple token is a sequence
       of alphabetic and/or numeric characters with no embedded punctuation,
       white space, or special symbols.)  The type field is an optional
       field that may be either of the keywords full or sparse.  If omitted,
       the type defaults to sparse.  The effect of this field is described
       in more detail below.  The expressions contained in the map
       declaration are one of the following forms.  Reserved keywords are
       printed in constant-width font, variables in italics:
            keylist ( string string )
            define ( word  value )
            word ( extension  result )
            string ( word word )
            strlist ( string string )
            error ( string )
            timed
       The keylist form is for defining lookup table entries while the
       remaining forms are the separate string functions.

       The definition form (define) allows a mnemonic word (the first
       argument) to be associated with a string (the second argument).  It
       is useful for replacing complicated sequences (for example, those
       containing special symbols or control characters) with mnemonic words
       to facilitate the design and readability of tables.

       Using the word form (where word must be a previously defined
       sequence) in a manner similar to a C function call results in the
       value of word being concatenated with extension; when the combination
       is recognized at runtime, it is mapped to result.  The value may be a
       string of characters or a single byte.  The following is an
       illustration (not intended to be complete):
            map (someaccents) {
                 define(acute '\047')
                 define(grave '`' )
                 acute(a '\341')       # same as string("\047a" "\341")
                 grave(a '\340')
                 # ...et cetera...
                 keylist("zyZY" "yzYZ")
            }

       This map (above) defines the single quote and reverse quote keys as
       dead-keys, which when followed by a produce a character from the ISO
       8859-1 codeset.  It is not necessary for the definition, extension,
       or result to be a single byte; they may be arbitrary strings.

       Strings in definitions and arguments may generally be entered either
       without quotation or between double quotes.  Byte constants may
       likewise be entered unquoted or between single quotes.  The only time
       quotation is strictly required is when the string contains
       parentheses, spaces, tab characters, or other special symbols.  The
       language makes no real distinction between byte constants and string
       constants: both are treated as null-terminated strings; the choice of
       whether to use a one-character string or a byte constant is thus a
       matter of taste.  Most quoting conventions of C are recognized,
       except that octal constants must be exactly three digits long.  Octal
       constants may be used in strings as well.  In the example above, the
       arguments to keylist need not be quoted, as they contain no special
       symbols.  The following example illustrates some situations where
       strings must be quoted:
            string(abc "two words")         # literal space
            keylist("[{}]" "(())")          # brackets/parentheses
            define(escseq "\033\t(") # tab and parenthesis
            define(space ' ')              # literal space
            string(abc "keylist")           # keyword used as argument

       Comments in files (inside or outside of map declarations) may be
       entered in the same manner as for sh(1); that is, after a # at the
       end of a line, or on a line beginning with #, as shown in the above
       examples.

       The keylist form allows single bytes to be mapped to other single
       bytes; it defines actions that are treated in the lookup table (that
       is, are performed before mapping).  Any byte value that is not
       explicitly changed by being included in a keylist form will, of
       course, be left unchanged; if no keylist forms appear in a map
       definition, then kbdcomp does not generate a lookup table for the
       map, and the lookup phase is skipped during module operation.  Each
       byte in the first string argument to keylist is mapped to the byte at
       the same position in the second string argument.  That is, given two
       strings X and Y as arguments: Xi maps to Yi, Xj maps to Yj and so
       forth.  The two arguments must, after evaluation, be found to contain
       the same number of bytes.

       The string form has a function similar to mnemonic forms defined with
       define and may be used for any type of many-many mapping.  The first
       argument to string is mapped to the second argument (see the comment
       in the sample map above).

       Mappings using both keylist and string or any define forms may be
       combined: if i is mapped to a with a keylist form, and a is used in
       the sequence `a, then when the user types `i, the sequence `a is seen
       by the string mapping process (because lookup is done first) and
       translated accordingly.

       The keylist form is intended mainly for use in simple keyboard re-
       arrangement and case-conversion applications; string is for one-many
       mapping or for isolated instances of many-many mapping; the define
       form and words defined with it are intended for more general use in
       groups of related sequences.  In some situations while a one-one
       mapping with keylist may be an obvious choice, the same effect may be
       achieved with string forms to avoid having a contradictory mapping.
       For example, suppose one desires, simultaneously, to translate x into
       y and y into abc.  If x is mapped to y via a keylist form and y is
       mapped to abc via a string form, then it may be impossible to obtain
       y itself (unless defined in another sequence), even though that was
       not the intention--the intention was to obtain y whenever the user
       enters x.  This is a contradictory mapping:
            keylist(x y)
            string(y abc)      # "y" itself cannot be generated

       There are cases where the intention is that y not be generated, but
       most often the intention is to generate it.  This problem (a
       relatively common one in codeset mapping) can be ``solved'' by using
       a string form to map x to y initially rather than using a keylist
       form.  This allows both y and abc to be generated:
            string(x y)
            string(y abc)

       Entering a large number of one-one mappings with string can be
       somewhat tedious.  To make things easier, the strlist form is
       provided.  The two string arguments to strlist are interpreted in the
       same manner as arguments to keylist, (that is, they are one-one
       mappings) except that they are not done by the lookup table, but are
       processed as string mappings.  In the following example, the first
       three string definitions can be reduced to the strlist form which
       follows:
            string(a b)
            string(c d)
            string(e f)
            strlist(ace bdf)

       It is important to recognize the difference between string and
       strlist: with string, the two arguments are a single mapping
       definition (which may be of any type) whereas with strlist, one or
       more one-one string mappings are defined simultaneously.  A set of
       mappings defined with a combination of string and strlist do not
       exhibit the same type of incompatibility described above between
       keylist and string.

       Some further aspects of module processing can now be presented.  When
       a partial match in an input sequence is detected during string
       processing, it is buffered; if at some point the match no longer
       succeeds, the first byte of the matched buffer is normally sent to
       the neighboring module.  The rest of the input is left in the buffer
       and scanned again to see if it matches the beginning of another
       sequence.  The error entry allows one to send a string (or byte)
       constant (called a fallback character) instead of the byte that began
       the previous sequence; this is particularly useful in codeset mapping
       and conversion applications where the character which failed to be
       translated might be one which does not occur or has some other
       meaning in the target codeset.  The following (somewhat contrived)
       example illustrates use of the error form:
            # turn arrow keys into vi commands
            map (vimap) {
                 string("\033[A" k) # up
                 string("\033[B" j) # down
                 error("!")
            }

       Given input of the escape character followed by [A or [B, a single
       character (j or k) is generated.  If presented with the sequence
       escape-[Q, the module will produce the sequence ![Q.  The error
       string !  replaces escape because the sequence failed to match when Q
       was received.  The remaining characters are re-scanned, and neither [
       nor Q is found to begin a recognized sequence.

       One-one mapping with strings or other defined forms (rather than via
       a keylist lookup table) is generally performed with a linear search
       operation when looking for bytes which begin sequences.  However, if
       the table is specified as a full table, it is initially indexed
       rather than searched linearly, and thus processed much more quickly
       when there are a large number of entries.  This should be kept in
       mind in codeset mapping applications where nearly all characters are
       mapped, and many (or most) are one-one mappings.  If only a very few
       characters are mapped with string functions, one must decide on
       whether to trade a small gain in processing speed for the space
       needed to store the index if a table is made full.

       The link form is used to produce a composite table.  A composite
       table is really a form of linkage that allows several tables to be
       used together in sequence as if the sequence were a single table.
       The string argument to link is of the following form:
            composite:component1,component2,componentn

       The target composite name is followed by a colon, and the ordered
       component list is comma-separated.  If the string argument contains
       spaces or special characters, it must be quoted.  (This string is not
       interpreted by kbdcomp, but is left intact in the output file; it is
       interpreted by the module at runtime.)  When a composite table is
       used, the effect is similar to pushing more than one instance of the
       attkbd module in the sense that the component tables function
       sequentially but it is accomplished within a single instance of the
       module.  As output is produced by processing with one table in the
       composite, the data is subsequently processed by the next component
       and so forth until the final result emerges at the end of the
       sequence.  (There is no restriction on the use of any combination of
       full and sparse tables in a composite.)

       Composite tables are useful for simplifying complex mapping
       situations by modularizing the processing and for increasing the re-
       usability of tables for different mapping applications.  Tables
       primarily implementing codeset mappings may be linked to other tables
       primarily implementing compose- or dead-key sequences.  With a single
       table implementing a common codeset mapping, several different tables
       implementing combinations of codeset mapping and compose-key layouts
       may be built.  A typical configuration might use one table for
       mapping from an external to internal codeset, then use one or more
       separate tables working in the internal codeset to provide compose-
       or dead-key functionality, as in the following example.  One table,
       646Sp-8859 maps from an ISO 646 variant (Spanish) external codeset to
       ISO 8859-1; this is combined with two other tables respectively
       implementing ISO 8859-1 by compose-sequences, and by dead-key
       sequences:
                 link("composed:646Sp-8859,8859-1-cmp")
                 link("deadkey:646Sp-8859,8859-1-dk")

       Composite tables can also be built while the module is running from
       the kbdload command line [see kbdload(1M) for details].  The
       component tables are linked and processed in the given order (left-
       to-right).  Because the link argument is actually parsed at run-time
       by the attkbd module, it is not an error to refer to tables that are
       not contained in the file currently being compiled.  An error will be
       generated when the file is loaded if any component of a link is not
       present in memory at that time.

       The extern form can be used to declare an external function managed
       by the alp module.  External functions are managed in a list by that
       module, and are available for use as if they were simple tables in
       attkbd.  External functions are not downloaded, they are resident in
       the kernel and merely accessed by the attkbd module [see alp(7) for
       more information].  Such functions can also be declared dynamically
       when needed [see kbdload(1M)].

       The directive timed may appear any place within a map declaration.
       If used, it causes the table within which it is defined to be
       interpreted in timeout mode.  In this mode, string mappings are
       considered to not match if more than a specified amount of time
       elapses after receipt of the first byte of a sequence without its
       being fully received and mapped.  Given a timed map in which abc is
       to be mapped to xyz and the timeout value is 30, if the user types
       ab, then waits for longer than 30 time units before typing c, the
       entire sequence will not be translated.  In this case the sequence is
       treated as any other mismatch would be: a is passed to the
       neighboring module, and b is checked to see if it begins a sequence.
       The timer is reset when a mismatch occurs, so that if bc is defined
       in this situation and c has just been received, it will be mapped as
       expected.  The default timeout is typically 1/5 to 1/3 of a second
       [see attkbd(7) for details].

       Timeout mode is generally useful in situations where terminal
       function keys are being interpreted, to distinguish between a string
       typed by the user and a function key string sent by the terminal; it
       is not intended for use with ``batch'' applications such as the iconv
       command, nor generally in pipelines [see pipe(2)].  In a composite
       table, some components may be timed and some not, making the mode
       useful for combinations of codeset mapping and function key mapping.

       Timing depends on several factors, including terminal baud-rate,
       system load, and the user's typing speed.  If the timeout value is
       too long, then typed sequences that happen to be the same as function
       keys will be erroneously mapped; if the value is too short, then
       function keys may be missed under a heavy system load or with low
       speed devices.  See kbdset(1) for information on how to change the
       timeout value, and attkbd(7) for information on how an administrator
       may change the default timeout value.  This directive should never be
       used in tables that implement codeset mapping, as it makes the
       results quite unpredictable.  Long timeouts, on the order of seconds,
       may be useful in some contexts.

   Building & Debugging
       Users who intend to build their own tables may study the source
       tables supplied with the distribution in the directory /usr/lib/kbd.

       If characters other than alpha-numerics are to be used, quoted
       strings are preferred to unquoted strings; quotation is required for
       some characters, as mentioned above.  Map names and the first
       arguments of define should be alpha-numeric tokens.

       The report generated by the -r option may be useful for debugging
       complex tables.  The report (produced on standard error) consists of
       two octal lists.  One list contains byte values that cannot be
       generated from the lookup table (if keylist forms are used).  The
       other list contains byte values that cannot be generated in any way;
       in other words, values that are neither parts of ``result text''
       (that is, products of string mappings) nor generated by the lookup
       table (if there is one), but that are used in other sequences.  The
       report does not exhaustively list unreachable paths, but may indicate
       whether they exist and help pinpoint them.

   Output Files
       The files produced by kbdcomp begin with a header.  The magic string
       is kbd!map, with a version number.  This header is immediately
       followed by the tables themselves.  (A file can contain more than one
       table.)  The lines below can be added to the /etc/magic file for the
       file command to recognize attkbd files.
            0      string      kbd!map     attkbd map file
            >8     byte        >0          Ver %d:
            >10    short       >0          with %d table(s)

LIMITATIONS
       A maximum length of 128 bytes for input strings and 256 bytes for
       output strings is imposed.  The total amount of space consumed by a
       single table is limited to around 65,000 bytes.  Versions are
       strictly incompatible; ``object'' tables are machine-dependent in
       their byte order and structure size.  Thus, while source files are
       portable, the output of kbdcomp is not.  This implies that when using
       remote devices across a network between heterogeneous machines,
       tables must be loaded on the machine where the module is actually
       pushed (that is, the remote side).

FILES
       /usr/lib/kbd          directory containing system standard map files

       /usr/lib/kbd/*.map    source for some system map files

SEE ALSO
       kbdload(1M), kbdset(1), iconv(1), attkbd(7), alp(7), cpz(4M).


Licensed material--property of copyright holder(s)

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026