BUILDLANG(1M) Domain/OS SysV BUILDLANG(1M)
NAME
buildlang - generate or display a locale.def file
SYNOPSIS
buildlang [-n] input_file
buildlang -d [fc|fd|fo|fx] locale_name
DESCRIPTION
The buildlang utility takes source files containing collation and
character classification information, and compiles them into binary
objects, called locale.def files.
Without the -d option, buildlang automatically sets up the language
environment as specified by input_file. The buildlang utility reads a
buildlang script specified in the input_file, creates a file called
locale.def, and installs the file in the appropriate directory.
The -d option causes buildlang to display the contents of the locale.def
file associated with the locale_name in the format of a buildlang-script,
so that the output can be modified and used as an input file to buildlang
to generate a modified locale.def file. If a character code is printable,
as defined by the current setting of the LC_CTYPE environment variable in
the user's environment, buildlang always outputs the character code in
the character constant form. If a character code is not printable, use
the -f argument to specify the form in which the character code will be
displayed.
The d option used with -fc, -fd, -fo, and -fx cause buildlang to output
each non-printable code in the character-constant, decimal-constant,
octal-constant, and hexadecimal-constant form, respectively (see
Constants section below).
If the -d option is specified without -fc, -fd, -fo, or -fx, buildlang
displays each printable character code in the character-constant form and
non-printable character code in the hexadecimal-constant form.
There are six categories of data in the locale.def file, recognized by
setlocale(3C), which make up a language definition. They are:
LC_COLLATE
Affects the behavior of regular expressions and the I18N string
collation functions (strcoll and strxfrm; see (string(3C).
LC_CTYPE Affects the behavior of the character conversion and character
handling functions (except for the isdigit and isxdigit
functions; see ctype(3C)).
LC_MONETARY
Affects the monetary formatting information returned by the
localeconv(3C) function.
LC_NUMERIC
Affects the decimal-point character for the formatted
input/output functions and the string conversion functions, as
well as the nonmonetary formatting information returned by the
localeconv(3C) and nl_langinfo(3C) functions.
LC_TIME Affects the behavior of the time conversion functions; see
ctime(3C) and nl_langinfo(3C) functions.
NOTE: This affects only the strftime function (not other time
functions).
LC_ALL Contains language-specific information that does not belong to
any of the other category (for example, yesstr/nostr).
A buildlang script also consists of the same six locale categories. The
beginning of each category is identified by a category tag, which has the
form of LC_category, where category is one of the following: COLLATE,
TYPE, MONETARY, NUMERIC, TIME, and ALL. The end of each category is
identified by an END_LC CATEGORY tag. The order of the categories in the
buildlang script is irrelevant and all category specifications are
optional. If a category is not specified, setlocale sets up the default
"C" locale for that category.
Each category is composed of one or more statements. Each statement
begins with a keyword, followed by one or more expressions. An expression
is a set of well-formed metacharacters, strings, and constants. The
buildlang utility also recognizes comments and separators.
More than one definition can be specified for each category. If a
category contains more than one definition, each additional definition
must be named via the modifier keyword described below. The first set of
specifications is the default definition which may or may not have a
modifier name.
The following is a list of category tags, keywords, and subsequent
expressions that are recognized by buildlang. The order of keywords
within a category is irrelevant wit the exception of the modifier
keyword. All keyword specifications are optional with the exception of
the langname and langid keywords. (Note the convention: tags use
uppercase characters; keywords use lowercase.)
Category Tags and Keywords
The following keywords do not belong to any category.
langname String identifying the name of the language; follows the naming
conventions of LANG environment variable:
language[_territory][.codeset]. This keyword is required.
langid Decimal number identifying the language id. This keyword is
required. The language id specified should be in the range of 1
- 999, and any user-defined language should assign its language
id to a value in the range of 901 - 999.
The following keyword can be used in any category; it must be used to
name a definition when a category contains more than one definition.
modifier String identifying the name of the modifier; must come before
any keyword in a set of specifications because it associates a
modifier with a set of specifications.
LC_ALL
The following keywords belong to the LC_ALL category and should come
between the category tag LC_ALL and END_LC.
yesstr String identifying the affirmative response for yes/no
questions.
nostr String identifying the negative response for yes/no questions.
direction String indicating text direction.
context String indicating character context analysis. (Should be set to
"null" or "0" to indicate no context analysis required.
LC_CTYPE
The following keywords belong to the LC_CTYPE category and should come
between the category tag LC_CTYPE and END_LC.
isupper Uppercase letters.
iscntrl Control characters.
isdigit Numeric characters.
islower Lowercase letters.
ispunct Punctuation characters.
isxdigit Hexadecimal digits.
ul Relationships between uppercase and lowercase characters. Used
for languages that have a one-to-one relationship between
lowercase and uppercase characters.
tolower Lowercase to uppercase relationships.
toupper Uppercase to lowercase relationships.
isspace space characters
isblank blank characters
bytes_char
String containing the maximum number of bytes per character for
the character set used for a specified language.
alt_punct String mapped into the ASCII equivalent string.
LC_COLLATE
The following keywords belong to the LC_COLLATE category and should come
between the category tag LC_COLLATE and END_LC.
sequence Sequence of character codes for collation.
modifier Allows you to select locale-specific collation.
LC_MONETARY
The following keywords belong to the LC_MONETARY category and should come
between the category LC_MONETARY and END_LC. These keywords, except
crncystr, are identical to the members in struct lconv defined in
<locale.h>.
int_curr_symbol international currency symbol
currency_symbol (local) currency symbol
mon_decimal_point monetary decimal point
mon_thousands_sep monetary thousands separator
mon_grouping monetary grouping
positive_sign positive sign
negative_sign negative sign
int_frac_digits international fractional digits
frac_digits fractional digits
p_cs_precedes positive currency symbol precedes
p_sep_by_space positive currency symbol separated by space
p_sign_posn positive sign position
n_cs_precedes negative currency symbol precedes
n_sep_by_space negative currency symbol separated by space
n_sign_posn negative sign position
crncystr String for specifying the currency
LC_NUMERIC
The following keywords belong to the LC_NUMERIC category and should come
between the category LC_NUMERIC and END_LC. These keywords, except
alt_digits are identical to the members in struct lconv defined in
<locale.h>.
decimal_point
decimal point (or, RADIXCHAR)
thousands_sep
thousands separator
grouping grouping
alt_digit String mapped into the ASCII equivalent string "0123456789b+-
,eE", where b is a blank.
LC_TIME
The following keywords belong to the LC_TIME category and should come
between the category LC_TIME and END_LC. These keywords, except era are
identical to the members defined in <langinfo.h>.
d_t_fmt
d_fmt
t_fmt
am_str
pm_str
day_1 to day_7
abday_1 to abday_7
mon_1 to mon_12
abmon_1 to abmon_12
year_unit
mon_unit
day_unit
hour_unit
min_unit
sec_unit
era_fmt
Expressions
Expressions consist of character-code constants, strings, and
metacharacters. There are four types of legal expressions: ctype, shift,
collate, and info.
ctype Ctype expressions follow the keywords isupper, islower,
iscntrl, isdigit, isspace, ispunct, isxdigit, isblank, isfirst,
and issecond. They can include either a single character-code
constant or a character-code range consisting of a constant
followed by a dash followed by another constant. At least one
separator must appear between the constants and dash. The
constant preceding the dash must have a smaller code value than
the constant following the dash. A range represents a set of
consecutive character codes.
shift Shift expressions follow the keywords ul, toupper and, tolower,
and must consist of two character-code constants enclosed by
left and right angle brackets. For ul and tolower, the first
constant represents an uppercase character and the second the
corresponding lowercase character. For toupper, the first
constant represents a lowercase character and the second the
corresponding uppercase character
collate Collate expressions that follow the keyword sequence represent
a sequence of character codes that define a collation order.
Each character code in the series is assigned an ascending
sequence number. Collate expressions include single character-
code constants, character-code ranges, character-code priority
sets, two-to-one character-code pairs, one-to-two character-
code pairs, and character-code don't care sets.
A character-code priority set is a collection of one or more
constants or other collate expressions enclosed by left and
right parenthesis. Constants or expressions within a priority
set have the same collation sequence number but different
priorities account for case and accent differences.
A two-to-one character-code pair is represented by two
character-code constants enclosed by left and right angle
brackets. Two-to-one characters are two adjacent characters
that occupy one position in the collating sequence. For the
expression sequence ('C' 'c') (<'C' h'> <'c' 'h'>) ('D' 'd')
instructs buildlang to treat the character combinations Ch and
ch as single characters that collate between lowercase c and
uppercase D.
A one-to-two character-code pair is represented by two
character-code constants enclosed by left and right angle
brackets. One-to-two characters are two adjacent characters
that occupy one position in the collating sequence. For
example, suppose the character 'X' represents a one-to-two
character that collates as 'AE'. This information can be
expressed as ('A' ['X' 'E']'a') The character 'X' has the same
primary sequence number as 'A' and 'a', a priority that lies
between 'A' and 'a' and a secondary sequence number that is the
same as 'E'.
A character-code don't care set is a collection of one or more
constants or other collate expressions enclosed by left and
right curly brackets. Constants or expressions within a don't
care set are ignored in character comparisons.
Info Info expressions follow all lconv-type keywords and era
keywords. Each expression is a string. (See Strings section.)
The expressions following the langinfo-type keywords define the
strings associated with the items in langinfo. Each expression
consists of a string to be associated with the item identified
by the keyword.
Constants
Constants represent character codes in the ctype, shift, and collate
expressions. C programming language and character constants can be used
as character codes, including: decimal constants, octal constants,
hexadecimal constants, and character constants.
Strings
Strings are used in info expressions. A string is a sequence of zero or
more characters surrounded by double quotes. Within a string, the
double-quote character must be preceded by a backslash (\). The \ can be
used to include special characters in the message text. These special
characters are defined as follows:
\n Inserts a newline character.
\t Inserts a horizontal tab character.
\b Inserts a backspace character.
\r Inserts a carriage-return character.
\f Inserts a formfeed character.
\\ Inserts a \ (backslash) character.
\ddd Inserts the single-byte character associated with the octal value
represented by the valid octal digits ddd. One, two, or three octal
digits can be specified; however, you must include leading zeros if
the characters following the octal digits are also valid octal
digits. For example, the octal value for $ is 44. To display $, use
\044.
Metacharacters
Metacharacters are characters having a special meaning to buildlang in
ctype, shift, and collate expressions. To escape the metacharacters,
surround them with single quotes. Included are:
- Represents a range of consecutive character codes.
< When used with the ul, toupper, and tolower keywords, indicates
the beginning of an uppercase lowercase character code
relationship. Inserts a horizontal tab character. When used
with the sequence keyword, indicates the beginning of a two-
to-one character pair.
> When used with the ul, or toupper keywords, indicates the end
of an uppercase lowercase character code relationship. When
used with the sequence keyword, indicates the end of a two-to-
one character pair.
[ Indicates the beginning of a one-to-two character pair.
] Indicates the end of a one-to-two character pair.
( Indicates the beginning of a group of character code constants
or expressions having the same collation sequence number, but
different priorities.
) Indicates the end of a group of character code constants or
expressions having the same collation sequence number, but
different priorities.
| Indicates the beginning of a group of character code constants
or expressions belonging to the same set of collation don't-
care characters.
| Indicates the end of a group of character code constants or
expressions belonging to the same set of collation don't-care
characters.
Comments
Comments are all characters between a pound sign (#) and a carriage
return, except when used in the character code constants and strings.
Comments and blank lines are ignored.
Separators
Separator characters include blanks and tabs. Any number of separators
can be used to delimit the keywords, metacharacters, constants, and
strings that comprise a buildlang script.
NOTES
LC_CTYPE determines the printable characters when the -d option is
specified.
If LC_CTYPE is not specified in the environment or is set to the empty
string, a default of "C" is used instead of LC_CTYPE.
Single-byte character code sets are supported.
EXAMPLE
## Domain_OS defined @(#) Domain_OS $revision: 66.1 $ SR10.4 $
# Language: american
# Codeset: ISO88591
langname "en_US.iso88591"
langid 101
##################################################
# LC_ALL category
LC_ALL
yesstr "yes" # yes string
nostr "no" # no string
direction "" # left-to-right orientation
context ""
END_LC
##################################################
# LC_COLLATE category
LC_COLLATE
modifier "fold" # @modifier: indicates upper and lower case characters
# are collated together
sequence ' ' 0xa0 '0' - '9'
( 'A' [ 0xc6 'E' ] 'a' [ 0xe6 'E' ] 0xc1 0xe1 0xc0 0xe0 0xc2 0xe2 0xc4 0xe4
0xc5 0xe5 0xc3 0xe3 )
( 'B' 'b' )
( 'C' 'c' 0xc7 0xe7 )
( 'D' 'd' 0xd0 0xf0 )
( 'E' 'e' 0xc9 0xe9 0xc8 0xe8 0xca 0xea 0xcb 0xeb )
( 'F' 'f' )
( 'G' 'g' )
( 'H' 'h' )
( 'I' 'i' 0xcd 0xed 0xcc 0xec 0xce 0xee 0xcf 0xef )
( 'J' 'j' )
( 'K' 'k' )
( 'L' 'l' )
( 'M' 'm' )
( 'N' 'n' 0xd1 0xf1 )
( 'O' 'o' 0xd3 0xf3 0xd2 0xf2 0xd4 0xf4 0xd6 0xf6 0xd5 0xf5
0xd8 0xf8 )
( 'P' 'p' )
( 'Q' 'q' )
( 'R' 'r' )
( 'S' [ 0xdf 'S' ] 's' )
( 'T' 't' )
( 'U' 'u' 0xda 0xfa 0xd9 0xf9 0xdb 0xfb 0xdc 0xfc )
( 'V' 'v' )
( 'W' 'w' )
( 'X' 'x' )
( 'Y' 'y' 0xdd 0xfd 0xff )
( 'Z' 'z' )
( 0xde 0xfe ) '(' ')' '[' ']' '{' '}' 0xab 0xbb '<' '>' '`' '''
'=' '+' '-' 0xd7 0xf7 0xb1 0xac 0xbc 0xbd 0xbe '*' '.'
',' ';' ':' '"' 0xbf '?' 0xa1 '!' '/' '' '|' 0xa6
0xb6 0xa7 '@' '&' 0xb0 '%' '#' '$' 0xa2 0xa3 0xa5 0xa4
0xb5 '^' '~' 0xb4 0xa8 0xb8 0xb7 0xaf '_' 0xad 0xaa 0xba
0xb9 0xb2 0xb3 0xa9 0xae 0x0 - 0x1f 0x80 - 0x9f 0x7f
modifier "nofold" # @modifier: indicates upper and lower case characters
# are collated separately
sequence ' ' 0xa0 '0' - '9'
( 'A' [ 0xc6 'E' ] 'a' [ 0xe6 'E' ] 0xc1 0xe1 0xc0 0xe0 0xc2 0xe2 0xc4 0xe4
0xc5 0xe5 0xc3 0xe3 )
( 'B' 'b' )
( 'C' 'c' 0xc7 0xe7 )
( 'D' 'd' 0xd0 0xf0 )
( 'E' 'e' 0xc9 0xe9 0xc8 0xe8 0xca 0xea 0xcb 0xeb )
( 'F' 'f' )
( 'G' 'g' )
( 'H' 'h' )
( 'I' 'i' 0xcd 0xed 0xcc 0xec 0xce 0xee 0xcf 0xef )
( 'J' 'j' )
( 'K' 'k' )
( 'L' 'l' )
( 'M' 'm' )
( 'N' 'n' 0xd1 0xf1 )
( 'O' 'o' 0xd3 0xf3 0xd2 0xf2 0xd4 0xf4 0xd6 0xf6 0xd5 0xf5
0xd8 0xf8 )
( 'P' 'p' )
( 'Q' 'q' )
( 'R' 'r' )
( 'S' [ 0xdf 'S' ] 's' )
( 'T' 't' )
( 'U' 'u' 0xda 0xfa 0xd9 0xf9 0xdb 0xfb 0xdc 0xfc )
( 'V' 'v' )
( 'W' 'w' )
( 'X' 'x' )
( 'Y' 'y' 0xdd 0xfd 0xff )
( 'Z' 'z' )
( 0xde 0xfe ) '(' ')' '[' ']' '{' '}' 0xab 0xbb '<' '>' '`' '''
'=' '+' '-' 0xd7 0xf7 0xb1 0xac 0xbc 0xbd 0xbe '*' '.'
',' ';' ':' '"' 0xbf '?' 0xa1 '!' '/' '' '|' 0xa6
0xb6 0xa7 '@' '&' 0xb0 '%' '#' '$' 0xa2 0xa3 0xa5 0xa4
0xb5 '^' '~' 0xb4 0xa8 0xb8 0xb7 0xaf '_' 0xad 0xaa 0xba
0xb9 0xb2 0xb3 0xa9 0xae 0x0 - 0x1f 0x80 - 0x9f 0x7f
END_LC
##################################################
# LC_CTYPE category
LC_CTYPE
isupper 'A' - 'Z' # true if an uppercase character
0xc0 - 0xd6
0xd8 - 0xdf
islower 'a' - 'z' # true if a lowercase character
0xdf - 0xf6
0xf8 - 0xff
isdigit '0' - '9' # true if a digit
isspace 0x9 - 0xd ' ' 0xa0 # true if a space
ispunct '!' '"' '#' '$' '%' # true if a punctuation character
'&' ''' '(' ')' '*' '+' ',' '-' '.' '/' ':' ';' '<' '='
'>' '?' '@' '[' '' ']' '^' '_' '`' '{' '|' '}' '~'
0xa1 - 0xbf 0xd7 0xf7
iscntrl 0x0 - 0x1f # true if a control character
0x7f 0x80 - 0x9f
isblank ' ' 0xa0
isxdigit '0' - '9' # true if a hex digit
'A' - 'F' 'a' - 'f'
ul # <upper lower >
< 'A' 'a' > < 'B' 'b' > < 'C' 'c' > < 'D' 'd' >
< 'E' 'e' > < 'F' 'f' > < 'G' 'g' > < 'H' 'h' >
< 'I' 'i' > < 'J' 'j' > < 'K' 'k' > < 'L' 'l' >
< 'M' 'm' > < 'N' 'n' > < 'O' 'o' > < 'P' 'p' >
< 'Q' 'q' > < 'R' 'r' > < 'S' 's' > < 'T' 't' >
< 'U' 'u' > < 'V' 'v' > < 'W' 'w' > < 'X' 'x' >
< 'Y' 'y' > < 'Z' 'z' > < 0xc0 0xe0 > < 0xc1 0xe1 >
< 0xc2 0xe2 > < 0xc3 0xe3 > < 0xc4 0xe4 > < 0xc5 0xe5 >
< 0xc6 0xe6 > < 0xc7 0xe7 > < 0xc8 0xe8 > < 0xc9 0xe9 >
< 0xca 0xea > < 0xcb 0xeb > < 0xcc 0xec > < 0xcd 0xed >
< 0xce 0xee > < 0xcf 0xef > < 0xd0 0xf0 > < 0xd1 0xf1 >
< 0xd2 0xf2 > < 0xd3 0xf3 > < 0xd4 0xf4 > < 0xd5 0xf5 >
< 0xd6 0xf6 > < 0xd8 0xf8 > < 0xd9 0xf9 > < 0xda 0xfa >
< 0xdb 0xfb > < 0xdc 0xfc > < 0xdd 0xfd > < 0xde 0xfe >
toupper < 0xff 'Y' > # special toupper relationship
bytes_char "1" # maximum number of bytes per char
alt_punct "" # no alternate punctuation
END_LC
##################################################
# LC_MONETARY category
LC_MONETARY
int_curr_symbol "USD "
currency_symbol "$"
mon_decimal_point "."
mon_thousands_sep ","
mon_grouping " 03"
positive_sign ""
negative_sign "-"
int_frac_digits "2"
frac_digits "2"
p_cs_precedes "1"
p_sep_by_space "0"
n_cs_precedes "1"
n_sep_by_space "0"
p_sign_posn "1"
n_sign_posn "1"
crncystr "-US$"
END_LC
##################################################
# LC_NUMERIC category
LC_NUMERIC
decimal_point "."
thousands_sep ","
grouping " 03"
alt_digit ""
END_LC
##################################################
# LC_TIME category
LC_TIME
d_t_fmt "%a, %b %d, %Y %I:%M:%S %p"
d_fmt "%a, %b %d, %Y"
t_fmt "%I:%M:%S %p"
day_1 "Sunday"
day_2 "Monday"
day_3 "Tuesday"
day_4 "Wednesday"
day_5 "Thursday"
day_6 "Friday"
day_7 "Saturday"
abday_1 "Sun"
abday_2 "Mon"
abday_3 "Tue"
abday_4 "Wed"
abday_5 "Thu"
abday_6 "Fri"
abday_7 "Sat"
mon_1 "January"
mon_2 "February"
mon_3 "March"
mon_4 "April"
mon_5 "May"
mon_6 "June"
mon_7 "July"
mon_8 "August"
mon_9 "September"
mon_10 "October"
mon_11 "November"
mon_12 "December"
abmon_1 "Jan"
abmon_2 "Feb"
abmon_3 "Mar"
abmon_4 "Apr"
abmon_5 "May"
abmon_6 "Jun"
abmon_7 "Jul"
abmon_8 "Aug"
abmon_9 "Sep"
abmon_10 "Oct"
abmon_11 "Nov"
abmon_12 "Dec"
am_str "AM"
pm_str "PM"
year_unit ""
mon_unit ""
day_unit ""
hour_unit ""
min_unit ""
sec_unit ""
era_fmt ""
END_LC
ERRORS
If buildlang detects any error, it terminates with an error message and
does not generate a locale.def file.
SEE ALSO
setlocale(3)