kji(7) 6/4/92 kji(7)
NAME
kji: euctojis, jistoeuc, euctosjis, sjistoeuc - kanji code conversion
algorithms
DESCRIPTION
The kanji code conversion algorithms euctojis, jistoeuc, euctosjis and
sjistoeuc are a set of kernel resident functions accessible through
the kbd(7) STREAMS module. When inserted into a stream, they translate
multi-byte kanji characters from one code set into another, according
to their names. For example, the euctosjis algorithm translates EUC
encoded kanji characters into their Shift-JIS equivalent. Typically a
pair of complementary algorithms is used, one in the input side of a
stream, and the other in the output side.
When first opened, the algorithms initially assume each stream to be
in a shifted-out state. Similarly, detaching an algorithm from a
stream has the effect of flushing any unconverted data and returning
the stream to a shifted-out state. Thus the detach operation may
cause a small amount of data to be output to the stream.
It is possible that conversion of input data may fail under some
conditions. The algorithms handle these errors by passing the piece of
erroneous data through unchanged, and, if the DEBUG compilation switch
was used, reporting a diagnostic message to the system trace logger
(see strace(1M)). The trace message details the stream ID involved,
and the error code, as found in <mlx-j/kanji.h>. The error conditions
are:
KJERROR
An invalid byte or byte sequence was detected in the input.
Typically, this happens when a byte which introduces a kanji
character is not followed by a byte sequence which correctly
completes it, or when a partial escape sequence or character is
found at EOF.
KJGAIJI
EUC codeset 3 (gaiji) characters are not part of Shift-JIS or any
of the 7-bit codes. When converting from EUC to any other code
set, gaiji characters are passed through in shifted-out mode, and
KJGAIJI is reported.
KJESC
When converting from EUC to 7-bit JIS, it is possible that a
shift-in or shift-out sequence is present in the ASCII portion of
the input. These escape sequences cannot be passed through to the
output since they would change the shift state. Such escape
sequences in the input are skipped and the return value is set to
KJESC.
CONFIGURATION
The euctojis and jistoeuc algorithms are concerned with translations
into and from the 7 bit JIS code set. There are a number of variations
Page 1 Reliant UNIX 5.44 6, 194
kji(7) 6/4/92 kji(7)
of the JIS code set in use. These algorithms have been configured for
compatibility with the New-JIS standard, in which the kanji shift-in
sequence, KI, is coded as "<ESC>$B" and the kanji shift-out sequence,
KO, is coded as "<ESC>(J" (i.e. the JIS-Roman shift-out sequence).
The EUC and Shift-JIS code sets both contain 63 half size katakana
characters which have full size (two byte) equivalents. The euctosjis
and sjistoeuc algorithms are configured so that half size katakana
characters are not expanded into their full size equivalents during
translation.
SEE ALSO
strchg(1), strconf(1), kbdload(1M), strace(1M), kbdset(1), kconv(3K),
kbd(7), alp(7).
Programmer's Guide: STREAMS.
Page 2 Reliant UNIX 5.44 6, 194