kconv(3K) 2/4/92 kconv(3K)
NAME
kconv - kanji code converter
SYNOPSIS
cc [flag...] file... -lkanji [library...]
#include <mlx-j/kanji.h>
int kconv(kjbuf *bp, int ifmt, int ofmt, int ko, int exp);
DESCRIPTION
kconv handles arbitrary conversions between EUC, Shift-JIS, JIS, Old
JIS and NEC-JIS encoded text. The routine supports all of the
characters defined in JIS X 0208-1990, as well as EUC codeset 3
(gaiji) characters.
The ifmt and ofmt arguments determine the input and output code and
may take the following values:
KJJIS - (New) JIS
KJOJIS - Old JIS
KJNECJIS - NEC-JIS
KJSJIS - Shift-JIS
KJEUC - Extended UNIX Code
If ofmt is KJJIS or KJOJIS, ko must be set to KJTOROMAN or
KJTOASCII. It determines the shift-out sequence to be placed in the
output (JIS-Roman or ASCII). ko is ignored for other values of ofmt.
The exp argument must be set to KJEXP or KJNOEXP. KJEXP indicates
that half-size katakana should be expanded to their full-size
equivalents, KJNOEXP causes half-size katakana to be preserved in
the output. The value of exp is ignored and expansion takes place
regardless if the output is in one of the 7-bit codes, since these
codes do not include half-size katakana.
bp is a pointer to a structure of type kjbuf, defined in
<mlx-j/kanji.h> as follows:
typedef struct {
unsigned char *kjin;
sizet kjisz;
sizet kjicnt;
unsigned char *kjout;
sizet kjosz;
sizet kjocnt;
int kjshift;
int kjeof;
} kjbuf;
kjin and kjout are pointers to the input and output buffer,
respectively. The buffer sizes must be at least KJMINIBUF and
Page 1 Reliant UNIX 5.44 2, 194
kconv(3K) 2/4/92 kconv(3K)
KJMINOBUF (4 and 9 bytes). However, for efficiency, the sizes should
be something more reasonable, say BUFSIZ bytes. The input and output
buffers must not overlap in memory.
kjisz indicates the number of bytes of input present in kjin, and
kjosz must be set to the size of the output buffer.
When kconv returns, kjicnt is set to the number of bytes consumed
from kjin, and kjocnt is set to the number of bytes placed into
kjout.
kjshift is used to keep track of the shift state for 7-bit codes. It
must be be set to 0 for the initial call and not be changed between
invocations of kconv.
The kjeof field is used to handle partial characters or escape
sequences at the end of the input buffer. For example, if the first
byte of a 2-byte kanji character is the last byte in the input buffer,
kconv does not immediately convert that byte, since it cannot yet
decide how to do the conversion. If kjeof is 0, the call returns with
kjicnt one less than kjisz. The caller is responsible for moving the
unconverted byte to the front of the input buffer, refilling the
remaining space with more input, and calling kconv again. The first
byte of the character is now at the front of the input buffer and the
character is converted correctly by the second call.
However, if a partial character is present at EOF, this approach does
not work, since no more bytes are available to do the conversion. In
this case, one more call must be made to kconv, with kjeof set to a
non-zero value. This forces kconv to convert whatever input is left.
Note that this is particularly important when expanding half-size
katakana. Two adjacent half-size katakana can combine to form a single
full-size character, but the same single characters can translate to
two separate full-size ones, depending on the context. If such a
half-size katakana is found at EOF, the final call with kjeof set to
a non-zero value ensures the correct conversion.
RETURN VALUE
Successful conversion returns a value of 0. Otherwise, one of the
error codes below is returned. Note that all error values indicate
non-fatal conditions, that is, conversion does not stop when an error
condition is detected.
KJERROR
An invalid byte or byte sequence was detected in the input.
Typically, this happens when a byte which introduces a kanji
character is not followed by a byte sequence to validly complete
it, or when a partial escape sequence or character is found
at EOF.
KJGAIJI
EUC codeset 3 (gaiji) characters are not part of Shift-JIS or any
Page 2 Reliant UNIX 5.44 2, 194
kconv(3K) 2/4/92 kconv(3K)
of the 7-bit codes. This means that gaiji characters are handled
correctly only when converting from EUC to EUC (to expand
half-size katakana). For other output codes, gaiji characters are
passed through (in shifted-out mode), and KJGAIJI is returned.
KJNEW
In 1983, four kanji were appended to JIS X 0208 level 2, followed
by another two kanji in 1990. These characters are not part of
Old JIS and NEC-JIS, and are passed through in shifted-out mode
for output codes of KJOJIS and KJNECJIS, with a return value
of KJNEW.
KJESC
When converting from EUC or Shift-JIS to one of the 7-bit codes,
it is possible that a shift-in or shift-out sequence is present
in the ASCII portion of the input. These escape sequences cannot
be passed through to the output since they would change the shift
state. Such escape sequences in the input are skipped and the
return value is set to KJESC.
NOTES
It is assumed that the input and output are initially in shifted-out
mode. The function also will force a shift-out sequence if the output
code is one of the 7-bit codes, and the input ended with a kanji
character. This means that if kconv was called with kjeof set, and
kjocnt has a value which indicates that the output buffer was filled
to more than KJMAXKO (3 bytes) within its capacity, it is necessary
to call the function one more time, with kjeof set, and a kjisz
value of 0, to get the final shift-out sequence.
Invalid byte sequences in the input are copied through to the output
unchanged (in shifted-out mode).
BUGS
If more than one error condition is detected during conversion of a
single input buffer, the return code always indicates the first
problem that was found.
Invalid byte sequences in the input may cause kconv to lose
synchronization with kanji character boundaries. If this happens, all
converted output following the error is likely to be garbage.
FILES
/usr/lib/libkanji.a
SEE ALSO
fkconv(3K), fkcode(3K), fkverify(3K), keuctoibmj(3K), kibmjtoeuc(3K).
Page 3 Reliant UNIX 5.44 2, 194