keuctoibmj(3K) 3/11/92 keuctoibmj(3K)
NAME
keuctoibmj - EUC to IBM host code kanji converter
SYNOPSIS
cc [flag...] file... -lkanji [library...]
#include <mlx-j/kanji.h>
int keuctoibmj(kjbuf *bp, int exp);
DESCRIPTION
keuctoibmj converts EUC encoded kanji text into IBM Japanese
(Katakana) Kanji Mixed Host Code (IBM CCSID's 13218 and 09122).
The exp argument must be set to KJEXP or KJNOEXP. KJEXP indicates
that half-size katakana should be expanded to their full-size
equivalents, KJNOEXP causes half-size katakana to be preserved in
the output.
bp is a pointer to a structure of type kjbuf, defined in
<mlx-j/kanji.h> as follows:
typedef struct {
unsigned char *kjin;
sizet kjisz;
sizet kjicnt;
unsigned char *kjout;
sizet kjosz;
sizet kjocnt;
int kjshift;
int kjeof;
} kjbuf;
kjin and kjout are pointers to the input and output buffer,
respectively. The buffer sizes must be at least KJMINIBUF and
KJMINOBUF (4 and 9 bytes). However, for efficiency, the sizes should
be something more reasonable, say BUFSIZ bytes. The input and output
buffers must not overlap in memory.
kjisz indicates the number of bytes of input present in kjin, and
kjosz must be set to the size of the output buffer.
When keuctoibmj returns, kjicnt is set to the number of bytes
consumed from kjin, and kjocnt is set to the number of bytes placed
into kjout.
kjshift is used to keep track of the shift state. It must be be set
to 0 for the initial call and not be changed between invocations of
keuctoibmj.
The kjeof field is used to handle partial characters at the end of
the input buffer. For example, if the first byte of a 2-byte kanji
Page 1 Reliant UNIX 5.44 3, 1911
keuctoibmj(3K) 3/11/92 keuctoibmj(3K)
character is the last byte in the input buffer, keuctoibmj does not
immediately convert that byte, since it cannot yet decide how to do
the conversion. If kjeof is 0, the call returns with kjicnt one less
than kjisz. The caller is responsible for moving the unconverted byte
to the front of the input buffer, refilling the remaining space with
more input, and calling keuctoibmj again. The first byte of the
character is now at the front of the input buffer and the character is
converted correctly by the second call.
However, if a partial character is present at EOF, this approach does
not work, since no more bytes are available to do the conversion. In
this case, one more call must be made to keuctoibmj, with kjeof set
to a non-zero value. This forces keuctoibmj to convert whatever input
is left. Note that this is particularly important when expanding
half-size katakana. Two adjacent half-size katakana can combine to
form a single full-size character, but the same single characters can
translate to two separate full-size ones, depending on the context. If
such a half-size katakana is found at EOF, the final call with kjeof
set to a non-zero value ensures the correct conversion.
RETURN VALUE
Successful conversion returns a value of 0. Otherwise, one of the
error codes below is returned. Note that all error values indicate
non-fatal conditions, that is, conversion does not stop when an error
condition is detected.
KJERROR
An invalid byte or byte sequence was detected in the input.
Typically, this happens when a byte which introduces a kanji
character is not followed by a byte sequence to validly complete
it, or when a partial character is found at EOF.
KJGAIJI
An EUC codeset 3 (gaiji) character was found in the input. Gaiji
characters are not part of IBM host code and are skipped.
KJESC
When converting from EUC to IBM host code, it is possible that a
shift-in or shift-out byte is present in the ASCII portion of the
input. These bytes cannot be passed through to the output since
they would change the shift state. Instead, such bytes in the
input are skipped and the return value is set to KJESC.
KJNOMAP
An EUC kanji character which does not exist in IBM host code was
present in the input. Such characters are skipped.
NOTES
The function forces a shift-out byte at the end of the output if the
input ended with a kanji character. This means that if keuctoibmj was
called with kjeof set, and kjocnt has a value which indicates that
the output buffer was filled to its capacity, it is necessary to call
Page 2 Reliant UNIX 5.44 3, 1911
keuctoibmj(3K) 3/11/92 keuctoibmj(3K)
the function one more time, with kjeof set, and a kjisz value of 0,
to get the final shift-out byte.
Invalid byte sequences in the input are copied through to the output
unchanged (in shifted-out mode).
BUGS
If more than one error condition is detected during conversion of a
single input buffer, the return code always indicates the first
problem that was found.
Invalid byte sequences in the input may cause keuctoibmj to lose
synchronization with kanji character boundaries. If this happens, all
converted output following the error is likely to be garbage.
FILES
/usr/lib/libkanji.a
SEE ALSO
fkeuctoibmj(3K), seuctoibmj(3K), ceuctoibmj(3K), kibmjtoeuc(3K),
kconv(3K).
Page 3 Reliant UNIX 5.44 3, 1911