Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ hpnls(5) — HP-UX 6.20

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

date(1)

sort(1)

conv(3C)

ctime(3C)

ctype(3C)

ecvt(3C)

nl_init(3C)

string(3C)

strtod(3C)

printf(3S)

scanf(3S)

ascii(5)

kana8(5)

roman8(5)

HPNLS(5)  —  HP-UX

NAME

hpnls − HP Native Language Support (NLS) Model

SYNOPSIS

ls /usr/lib/nls/*

DESCRIPTION

The HP Native Language Support (NLS) model includes several capabilities that reduce or eliminate the barriers that would otherwise make HP−UX difficult to use in a non-English language.  The three main categories, Character Set Support, Local Customs, and Messages, are subdivided into smaller categories to adequately reflect the extent of the Native Language Support. 

Character set support
A major NLS objective is to provide capabilities for adapting character sequences to local language needs.

Character code size
The length of the character code governs the number of distinct characters that can be included in the character set.

7−bit The ASCII character set consists of 33 control characters including DEL, space, and 94 printable characters.  (See ascii(5).) This is sufficient to span the Latin alphabet, uppercase and lowercase, plus punctuation and special symbols.  Seven bits of information is sufficient to distinguish the characters in such a set.

8−bit The use of an 8-bit character code allows 67 control codes, space, and 188 printable characters.  In the case of European characters, this provides sufficient space for accented vowels, consonants with special forms, and other special symbols.  (See roman8(5).) This is also sufficient to hold the phonetic Japanese character set Katakana. (See kana8(5).)

16−bit A number of languages have very large character sets that require more than the 188 printable characters provided by the 8-bit character codes.  Sixteen−bit character codes are available for these languages.  To simplify processing, 16-bit printable characters are formed from pairs of 8-bit printable characters (in which neither byte can contain a control code or a space).  This allows representation of up to 35344 characters. 

Character typing
Character processing that depends on character type must take into account the character type changes that vary with the character set being used.  For example, an alphabetic character in the ROMAN8 character set can align with a punctuation character in the KANA8 set.

Shifting
While the ROMAN8 character set has uppercase and lowercase for most alphabetic characters, some languages discard accents when characters are shifted to uppercase. Other alphabetic characters may not be shifted at all, when there is no notion of "case" in the underlying language.

Collating
The ASCII collation order, while generally tolerated, is not adequate for American dictionary usage.  Different languages sort characters from the ROMAN8 set in different orders.  Some languages require that character pairs, such as "ch" and "ll" in Spanish, be sorted as single characters.  Ideographic character sets can have multiple orderings.  For example, Japanese characters can be sorted in phonetic order; in a different order based on the number of strokes in the ideogram; or according, first, to the radical (root) of the character and, second, to the number of strokes added to the radical.

Directionality
Two properties of text files and Native Languages must be understood to process text in non-Western languages — the mode of the language and the order of the characters.

Mode refers to the direction that a language is naturally read.  European languages read from left to right, some Middle Eastern languages read from right to left, and Far Eastern languages usually use vertical columns, beginning from the right. 

Order describes the order that characters are written, stored in a file, or displayed.  Keyboard order refers to the order of keystrokes by a user.  Screen order refers to the order that characters are displayed on a terminal screen or printed. 

Screen order can differ from keyboard order when using a terminal that supports mixing Latin and non-Latin text, each requiring different directionality.  In the following example, the text mode is right to left; n represents a non-Latin character, l represents a Latin character, and the numbers represent the order in which the sequence is typed. 

In keyboard order, the letters would be stored in a file as follows:

n1 n2 n3 l4 l5 l6

In screen order, the letters would be stored in a file as follows:

n1 n2 n3 l6 l5 l4

However, both screen-order and key-order sequences would look identical on the screen, because the terminal would be configured to display the characters properly according to the directionality requirements of both the Latin and non-Latin languages. 

Coding Scheme Considerations
Although most HP supported 8-bit character sets preserve the ASCII codes in the range of 0 to 127, 16-bit character sets can use these byte values in 2−byte characters.  Software that assigns special meaning to bytes (metacharacters) in this range must distinguish between 1−byte and 2−byte characters. In multilingual environments, standard escape code sequences are used to indicate change to alternate character sets.  Since these sequences are not usually printed or displayed, the number of characters output is usually less than the number of bytes in the sequence.  Any software that must locate a character within a sequence must accommodate this.

Local Customs
Some aspects of Native Language Support relate more to local customs of a particular geographic location than to the characters used to write the language.  These can include representation of numbers, currency, date and time.

Representation of numbers
The character used to denote the radix of a decimal number varies for different regions. Similarly the use of a "thousands" indicator or grouping of (usually three) digits can vary with local custom. Characters used to represent digits can also vary for different regions.

Currency representation
The symbol for currency varies from country to country.  The symbol can either precede or follow the numeric value.  Some currencies allow decimal fractions while others use alternate methods of representing smaller monetary values.

Date and time representation
Month and weekday names vary with language (if they are not omitted entirely). Abbreviations can be other than three characters, or might not be allowed at all. Even when a strictly numeric representation is used, the order of year, month and day, as well as the delimiters that separate them, is not universal.

Date and time adjustments
The HP−UX system clock runs on Greenwich Mean Time (GMT).  Corrections to local time zones consist of adding or subtracting whole or fractional hours from GMT. The Gregorian calendar is most common, but some locales use different methods for determining meridian day and year, usually based on seasonal, astronomical, or historical events.

Messages
The need for messages to be readable by users is perhaps the most significant justification for implementing Native Language Support.

Message content
Error messages, prompts, expected responses, and mnemonic command names should be based on the user’s native language.

Message structure
Messages must often be built from segments.  To accommodate grammatical differences, it may be necessary to change the order in which the fragments are connected.

EXAMPLES

A "fully localized" version of pr(1) meets the following criteria:

It preserves the eighth bit of a character code.
It formats properly the date in each page header.
It accounts for non−printing escape sequences.
It uses the message catalog system to select user error messages.

FILES

/usr/lib/nls/*

AUTHOR

Hpnls was developed by HP. 

SEE ALSO

date(1), sort(1), conv(3C), ctime(3C), ctype(3C), ecvt(3C), nl_init(3C), string(3C), strtod(3C), printf(3S), scanf(3S), ascii(5), kana8(5), roman8(5). 

Native Language Support, manual in HP-UX Concepts and Tutorials:  Device I/O and User Interfacing. 

For additional information, see the INTERNATIONAL SUPPORT section on other manual pages of commands and library routines. 

Hewlett-Packard Company  —  May 11, 2021

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026