fkcode(3K) 2/4/92 fkcode(3K)
NAME
fkcode - kanji code detector
SYNOPSIS
cc [flag...] file... -lkanji [library...]
#include <mlx-j/kanji.h>
int fkcode(FILE *stream, FILE *save, unsigned limit);
DESCRIPTION
fkcode examines an input stream for kanji characters and determines
the type of encoding used. The routine recognizes kanji characters in
EUC, Shift-JIS, JIS, Old JIS and NEC-JIS codes.
stream is the input stream to be examined. It must be open for reading
and must be aligned on a character boundary.
save is a stream open for writing. Bytes consumed from stream to
determine the input encoding are written to this file. This is useful
if the input is connected to a pipe or a device which cannot be
rewound. save may be set to NULL if consumed input does not need to be
saved.
limit can be used to limit the number of bytes read from stream in
order to determine the input encoding. If limit is set to zero,
lookahead is limited only by EOF.
RETURN VALUE
fkcode returns -1 if an I/O error occurs on stream or save. Otherwise,
the return value is one of the constants defined in <mlx-j/kanji.h>
which indicates an input encoding:
KJJIS - (New) JIS
KJOJIS - Old JIS
KJNECJIS - NEC-JIS
KJSJIS - Shift-JIS
KJEUC - Extended UNIX Code
KJEUCORSJIS - ambiguous input, either EUC or Shift-JIS
KJASCII - input only contains 7-bit ASCII bytes
KJUNKNOWN - unknown encoding
NOTES
Neither stream nor save are rewound when fkcode returns. Both file
pointers are left on a character boundary. If the input code is one of
the 7-bit codes, the stream file pointer points at the byte following
the first shift-in sequence, that is, stream is positioned in
shifted-in mode.
A return value of KJEUCORSJIS is most commonly caused by an input
file that contains only half-size katakana. The byte values for EUC
and Shift-JIS completely overlap for these.
Page 1 Reliant UNIX 5.44 2, 194
fkcode(3K) 2/4/92 fkcode(3K)
A return value of KJUNKNOWN indicates that the input was not in any
recognized encoding.
BUGS
fkcode determines the input encoding by looking for the first byte
sequence in the input stream which can be uniquely assigned to a
codeset. If the input file contains more than one encoding, the
encoding used first is reported.
EUC and Shift-JIS cannot always be distinguished. This means that in
the worst case, with a limit value of 0, fkcode consumes all input to
EOF (and possibly writes all of the input to save) before reporting
failure. The same is true if the input does not contain any kanji
characters.
FILES
/usr/lib/libkanji.a
SEE ALSO
fkconv(3K), fkverify(3K).
Page 2 Reliant UNIX 5.44 2, 194