EUC(4) RISC/os Reference Manual EUC(4)
NAME
EUC - Extended UNIX Code
DESCRIPTION
Up to four code sets can be used concurrently at both the
file and process level, enabling the use of languages which
require characters not within the US ASCII range. The
external code set represents the set of characters that can
be used. There are many diverse code sets used around the
world. Each of these code sets is mapped to a different
internal code set representation which is then used by the
RISC/os system during processing. The internal code set
scheme is called the Extended UNIX Code, or EUC.
The EUC code comprises a primary code set (code set 0) which
is always assigned to the 7-bit US ASCII character set, and
three supplementary code sets (code sets 1 through 3) which
can be assigned to other character sets.
The EUC code sets are distinguished by the values of the
most significant bits (MSB) of the EUC representation and by
single-shift characters. This combination defines the inter-
nal coding template for each of the four code sets. The MSB
of each byte is the left-most bit in the standard represen-
tation of a byte.
The representation of the single-byte primary code set has
the MSB set to zero. The three supplementary code sets have
the MSB of each byte set to one. Code sets 2 and 3 are
further distinguished by single-shift character 2 (SS2) and
single-shift character 3 (SS3), respectively. This coding
scheme conforms to the International Standard ISO 2022.
Code Set EUC Representation
Code Set 0 0xxxxxxx
Code Set 1 1xxxxxxx [ 1xxxxxxx [...] ]
Code Set 2 SS2 1xxxxxxx [ 1xxxxxxx [...] ]
Code Set 3 SS3 1xxxxxxx [ 1xxxxxxx [...] ]
A single-shift character is a single byte which indicates a
temporary shift for the next character to code set 2 or 3.
SS2 is represented in hexadecimal by 8E, and SS3 by 8F. The
usage and definition of these shift codes conform to the
International Standards ISO 2022 and ISO 6937/3.
In addition to the primary and supplementary code sets, the
internal EUC representations also include the space and
delete characters, two control character sets, and unas-
signed codes shown as follows.
Printed 11/19/92 Page 1
EUC(4) RISC/os Reference Manual EUC(4)
Code Set EUC Representation
Space 00100000
Delete 01111111
Control Character Set 0 (C0) 000xxxxx
Control Character Set 1 (C1) 100xxxxx
SEE ALSO
Internationalized RISC/os Guide.
Page 2 Printed 11/19/92