kconv(3K) — Reliant UNIX 5.44c4

kconv(3K)                         2/4/92                          kconv(3K)

NAME
     kconv - kanji code converter

SYNOPSIS
     cc [flag...] file... -lkanji [library...]

     #include <mlx-j/kanji.h>

     int kconv(kjbuf *bp, int ifmt, int ofmt, int ko, int exp);

DESCRIPTION
     kconv handles arbitrary conversions between EUC, Shift-JIS,  JIS,  Old
     JIS  and  NEC-JIS  encoded  text.  The  routine  supports  all  of the
     characters defined  in  JIS X 0208-1990,  as  well  as  EUC  codeset 3
     (gaiji) characters.

     The ifmt and ofmt arguments determine the input and  output  code  and
     may take the following values:

          KJJIS       - (New) JIS
          KJOJIS      - Old JIS
          KJNECJIS   - NEC-JIS
          KJSJIS      - Shift-JIS
          KJEUC       - Extended UNIX Code

     If ofmt is KJJIS or  KJOJIS,  ko  must  be  set  to  KJTOROMAN  or
     KJTOASCII.  It determines the shift-out sequence to be placed in the
     output (JIS-Roman or ASCII). ko is ignored for other values of ofmt.

     The exp argument must be set to KJEXP or KJNOEXP. KJEXP  indicates
     that   half-size  katakana  should  be  expanded  to  their  full-size
     equivalents, KJNOEXP causes half-size katakana to  be  preserved  in
     the  output.  The  value  of  exp is ignored and expansion takes place
     regardless if the output is in one of the  7-bit  codes,  since  these
     codes do not include half-size katakana.

     bp  is  a  pointer  to  a  structure  of  type  kjbuf,   defined   in
     <mlx-j/kanji.h> as follows:

          typedef struct {
                  unsigned char   *kjin;
                  sizet          kjisz;
                  sizet          kjicnt;
                  unsigned char   *kjout;
                  sizet          kjosz;
                  sizet          kjocnt;
                  int             kjshift;
                  int             kjeof;
          } kjbuf;

     kjin and  kjout  are  pointers  to  the  input  and  output  buffer,
     respectively.  The  buffer  sizes  must  be  at  least KJMINIBUF and



Page 1                       Reliant UNIX 5.44                       2, 194

kconv(3K)                         2/4/92                          kconv(3K)

     KJMINOBUF (4 and 9 bytes). However, for efficiency, the sizes should
     be  something  more reasonable, say BUFSIZ bytes. The input and output
     buffers must not overlap in memory.

     kjisz indicates the number of bytes of input present  in  kjin,  and
     kjosz must be set to the size of the output buffer.

     When kconv returns, kjicnt is set to the  number  of  bytes  consumed
     from  kjin,  and  kjocnt  is  set to the number of bytes placed into
     kjout.

     kjshift is used to keep track of the shift state for 7-bit codes.  It
     must  be  be  set to 0 for the initial call and not be changed between
     invocations of kconv.

     The kjeof field is  used  to  handle  partial  characters  or  escape
     sequences  at  the  end of the input buffer. For example, if the first
     byte of a 2-byte kanji character is the last byte in the input buffer,
     kconv  does  not  immediately  convert  that byte, since it cannot yet
     decide how to do the conversion. If kjeof is 0, the call returns with
     kjicnt one less than kjisz. The caller is responsible for moving the
     unconverted byte to the front  of  the  input  buffer,  refilling  the
     remaining  space  with  more input, and calling kconv again. The first
     byte of the character is now at the front of the input buffer and  the
     character is converted correctly by the second call.

     However, if a partial character is present at EOF, this approach  does
     not  work,  since no more bytes are available to do the conversion. In
     this case, one more call must be made to kconv, with kjeof set  to  a
     non-zero  value.  This forces kconv to convert whatever input is left.
     Note that this is  particularly  important  when  expanding  half-size
     katakana. Two adjacent half-size katakana can combine to form a single
     full-size character, but the same single characters can  translate  to
     two  separate  full-size  ones,  depending  on  the context. If such a
     half-size katakana is found at EOF, the final call with kjeof set  to
     a non-zero value ensures the correct conversion.

RETURN VALUE
     Successful conversion returns a value of  0.  Otherwise,  one  of  the
     error  codes  below  is  returned. Note that all error values indicate
     non-fatal conditions, that is, conversion does not stop when an  error
     condition is detected.

     KJERROR
          An invalid byte or byte  sequence  was  detected  in  the  input.
          Typically,  this  happens  when  a  byte which introduces a kanji
          character is not followed by a byte sequence to validly  complete
          it,  or  when  a  partial  escape  sequence or character is found
          at EOF.

     KJGAIJI
          EUC codeset 3 (gaiji) characters are not part of Shift-JIS or any


Page 2                       Reliant UNIX 5.44                       2, 194

kconv(3K)                         2/4/92                          kconv(3K)

          of  the 7-bit codes. This means that gaiji characters are handled
          correctly only  when  converting  from  EUC  to  EUC  (to  expand
          half-size katakana). For other output codes, gaiji characters are
          passed through (in shifted-out mode), and KJGAIJI is returned.

     KJNEW
          In 1983, four kanji were appended to JIS X 0208 level 2, followed
          by  another  two  kanji in 1990. These characters are not part of
          Old JIS and NEC-JIS, and are passed through in  shifted-out  mode
          for  output  codes of KJOJIS and KJNECJIS, with a return value
          of KJNEW.

     KJESC
          When converting from EUC or Shift-JIS to one of the 7-bit  codes,
          it  is  possible that a shift-in or shift-out sequence is present
          in the ASCII portion of the input. These escape sequences  cannot
          be passed through to the output since they would change the shift
          state. Such escape sequences in the input  are  skipped  and  the
          return value is set to KJESC.

NOTES
     It is assumed that the input and output are initially  in  shifted-out
     mode.  The function also will force a shift-out sequence if the output
     code is one of the 7-bit codes, and  the  input  ended  with  a  kanji
     character.  This  means  that if kconv was called with kjeof set, and
     kjocnt has a value which indicates that the output buffer was  filled
     to  more than KJMAXKO (3 bytes) within its capacity, it is necessary
     to call the function one more time, with  kjeof  set,  and  a  kjisz
     value of 0, to get the final shift-out sequence.

     Invalid byte sequences in the input are copied through to  the  output
     unchanged (in shifted-out mode).

BUGS
     If more than one error condition is detected during  conversion  of  a
     single  input  buffer,  the  return  code  always  indicates the first
     problem that was found.

     Invalid  byte  sequences  in  the  input  may  cause  kconv  to   lose
     synchronization  with kanji character boundaries. If this happens, all
     converted output following the error is likely to be garbage.

FILES
     /usr/lib/libkanji.a

SEE ALSO
     fkconv(3K), fkcode(3K), fkverify(3K), keuctoibmj(3K), kibmjtoeuc(3K).







Page 3                       Reliant UNIX 5.44                       2, 194

Museum

Related Articles