Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ buildlang(1M) — sys5 — Apollo Domain/OS SR10.4.1

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

setlocale(3)

BUILDLANG(1M)                   Domain/OS SysV                   BUILDLANG(1M)



NAME
     buildlang - generate or display a locale.def file

SYNOPSIS
     buildlang [-n] input_file
     buildlang -d [fc|fd|fo|fx] locale_name

DESCRIPTION
     The buildlang utility takes source files containing collation and
     character classification information, and compiles them into binary
     objects, called locale.def files.

     Without the -d option, buildlang automatically sets up the language
     environment as specified by input_file.  The buildlang utility reads a
     buildlang script specified in the input_file, creates a file called
     locale.def, and installs the file in the appropriate directory.

     The -d option causes buildlang to display the contents of the locale.def
     file associated with the locale_name in the format of a buildlang-script,
     so that the output can be modified and used as an input file to buildlang
     to generate a modified locale.def file. If a character code is printable,
     as defined by the current setting of the LC_CTYPE environment variable in
     the user's environment, buildlang always outputs the character code in
     the character constant form. If a character code is not printable, use
     the -f argument to specify the form in which the character code will be
     displayed.

     The d option used with -fc, -fd, -fo, and -fx cause buildlang to output
     each non-printable code in the character-constant, decimal-constant,
     octal-constant, and hexadecimal-constant form, respectively (see
     Constants section below).

     If the -d option is specified without -fc, -fd, -fo, or -fx, buildlang
     displays each printable character code in the character-constant form and
     non-printable character code in the hexadecimal-constant form.

     There are six categories of data in the locale.def file, recognized by
     setlocale(3C), which make up a language definition. They are:

     LC_COLLATE
               Affects the behavior of regular expressions and the I18N string
               collation functions (strcoll and strxfrm; see (string(3C).

     LC_CTYPE  Affects the behavior of the character conversion and character
               handling functions (except for the isdigit and isxdigit
               functions; see ctype(3C)).

     LC_MONETARY
               Affects the monetary formatting information returned by the
               localeconv(3C) function.

     LC_NUMERIC
               Affects the decimal-point character for the formatted
               input/output functions and the string conversion functions, as
               well as the nonmonetary formatting information returned by the
               localeconv(3C) and nl_langinfo(3C) functions.

     LC_TIME   Affects the behavior of the time conversion functions; see
               ctime(3C) and nl_langinfo(3C) functions.

               NOTE: This affects only the strftime function (not other time
               functions).

     LC_ALL    Contains language-specific information that does not belong to
               any of the other category (for example, yesstr/nostr).

     A buildlang script also consists of the same six locale categories.  The
     beginning of each category is identified by a category tag, which has the
     form of LC_category, where category is one of the following: COLLATE,
     TYPE, MONETARY, NUMERIC, TIME, and ALL. The end of each category is
     identified by an END_LC CATEGORY tag. The order of the categories in the
     buildlang script is irrelevant and all category specifications are
     optional. If a category is not specified, setlocale sets up the default
     "C" locale for that category.

     Each category is composed of one or more statements. Each statement
     begins with a keyword, followed by one or more expressions. An expression
     is a set of well-formed metacharacters, strings, and constants. The
     buildlang utility also recognizes comments and separators.

     More than one definition can be specified for each category. If a
     category contains more than one definition, each additional definition
     must be named via the modifier keyword described below. The first set of
     specifications is the default definition which may or may not have a
     modifier name.

     The following is a list of category tags, keywords, and subsequent
     expressions that are recognized by buildlang. The order of keywords
     within a category is irrelevant wit the exception of the modifier
     keyword. All keyword specifications are optional with the exception of
     the langname and langid keywords.  (Note the convention: tags use
     uppercase characters; keywords use lowercase.)

     Category Tags and Keywords
     The following keywords do not belong to any category.

     langname  String identifying the name of the language; follows the naming
               conventions of LANG environment variable:
               language[_territory][.codeset]. This keyword is required.

     langid    Decimal number identifying the language id. This keyword is
               required. The language id specified should be in the range of 1
               - 999, and any user-defined language should assign its language
               id to a value in the range of 901 - 999.

     The following keyword can be used in any category; it must be used to
     name a definition when a category contains more than one definition.

     modifier  String identifying the name of the modifier; must come before
               any keyword in a set of specifications because it associates a
               modifier with a set of specifications.

     LC_ALL
     The following keywords belong to the LC_ALL category and should come
     between the category tag LC_ALL and END_LC.

     yesstr    String identifying the affirmative response for yes/no
               questions.

     nostr     String identifying the negative response for yes/no questions.

     direction String indicating text direction.

     context   String indicating character context analysis. (Should be set to
               "null" or "0" to indicate no context analysis required.

     LC_CTYPE
     The following keywords belong to the LC_CTYPE category and should come
     between the category tag LC_CTYPE and END_LC.

     isupper   Uppercase letters.

     iscntrl   Control characters.

     isdigit   Numeric characters.

     islower   Lowercase letters.

     ispunct   Punctuation characters.

     isxdigit  Hexadecimal digits.

     ul        Relationships between uppercase and lowercase characters. Used
               for languages that have a one-to-one relationship between
               lowercase and uppercase characters.

     tolower   Lowercase to uppercase relationships.

     toupper   Uppercase to lowercase relationships.

     isspace   space characters

     isblank   blank characters

     bytes_char
               String containing the maximum number of bytes per character for
               the character set used for a specified language.

     alt_punct String mapped into the ASCII equivalent string.

     LC_COLLATE
     The following keywords belong to the LC_COLLATE category and should come
     between the category tag LC_COLLATE and END_LC.

     sequence  Sequence of character codes for collation.

     modifier  Allows you to select locale-specific collation.

     LC_MONETARY
     The following keywords belong to the LC_MONETARY category and should come
     between the category LC_MONETARY and END_LC. These keywords, except
     crncystr, are identical to the members in struct lconv defined in
     <locale.h>.

     int_curr_symbol     international currency symbol

     currency_symbol     (local) currency symbol

     mon_decimal_point   monetary decimal point

     mon_thousands_sep   monetary thousands separator

     mon_grouping        monetary grouping

     positive_sign       positive sign

     negative_sign       negative sign

     int_frac_digits     international fractional digits

     frac_digits         fractional digits

     p_cs_precedes       positive currency symbol precedes
     p_sep_by_space      positive currency symbol separated by space

     p_sign_posn         positive sign position

     n_cs_precedes       negative currency symbol precedes

     n_sep_by_space      negative currency symbol separated by space

     n_sign_posn         negative sign position

     crncystr            String for specifying the currency

     LC_NUMERIC
     The following keywords belong to the LC_NUMERIC category and should come
     between the category LC_NUMERIC and END_LC. These keywords, except
     alt_digits are identical to the members in struct lconv defined in
     <locale.h>.

     decimal_point
               decimal point (or, RADIXCHAR)

     thousands_sep
               thousands separator

     grouping  grouping

     alt_digit String mapped into the ASCII equivalent string "0123456789b+-
               ,eE", where b is a blank.

     LC_TIME
     The following keywords belong to the LC_TIME category and should come
     between the category LC_TIME and END_LC. These keywords, except era are
     identical to the members defined in <langinfo.h>.

     d_t_fmt

     d_fmt

     t_fmt

     am_str

     pm_str

     day_1  to day_7

     abday_1 to abday_7

     mon_1 to mon_12

     abmon_1 to abmon_12

     year_unit

     mon_unit

     day_unit

     hour_unit

     min_unit

     sec_unit

     era_fmt
     Expressions

     Expressions consist of character-code constants, strings, and
     metacharacters. There are four types of legal expressions: ctype, shift,
     collate, and info.

     ctype     Ctype expressions follow the keywords isupper, islower,
               iscntrl, isdigit, isspace, ispunct, isxdigit, isblank, isfirst,
               and issecond.  They can include either a single character-code
               constant or a character-code range consisting of a constant
               followed by a dash followed by another constant. At least one
               separator must appear between the constants and dash. The
               constant preceding the dash must have a smaller code value than
               the constant following the dash. A range represents a set of
               consecutive character codes.

     shift     Shift expressions follow the keywords ul, toupper and, tolower,
               and must consist of two character-code constants enclosed by
               left and right angle brackets. For ul and tolower, the first
               constant represents an uppercase character and the second the
               corresponding lowercase character. For toupper, the first
               constant represents a lowercase character and the second the
               corresponding uppercase character

     collate   Collate expressions that follow the keyword sequence represent
               a sequence of character codes that define a collation order.
               Each character code in the series is assigned an ascending
               sequence number. Collate expressions include single character-
               code constants, character-code ranges, character-code priority
               sets, two-to-one character-code pairs, one-to-two character-
               code pairs, and character-code don't care sets.

               A character-code priority set is a collection of one or more
               constants or other collate expressions enclosed by left and
               right parenthesis. Constants or expressions within a priority
               set have the same collation sequence number but different
               priorities account for case and accent differences.

               A two-to-one character-code pair is represented by two
               character-code constants enclosed by left and right angle
               brackets. Two-to-one characters are two adjacent characters
               that occupy one position in the collating sequence. For the
               expression sequence ('C' 'c') (<'C' h'> <'c' 'h'>) ('D' 'd')
               instructs buildlang to treat the character combinations Ch and
               ch as single characters that collate between lowercase c and
               uppercase D.

               A one-to-two character-code pair is represented by two
               character-code constants enclosed by left and right angle
               brackets. One-to-two characters are two adjacent characters
               that occupy one position in the collating sequence. For
               example, suppose the character 'X' represents a one-to-two
               character that collates as 'AE'. This information can be
               expressed as ('A' ['X' 'E']'a') The character 'X' has the same
               primary sequence number as 'A' and 'a', a priority that lies
               between 'A' and 'a' and a secondary sequence number that is the
               same as 'E'.

               A character-code don't care set is a collection of one or more
               constants or other collate expressions enclosed by left and
               right curly brackets. Constants or expressions within a don't
               care set are ignored in character comparisons.

     Info      Info expressions follow all lconv-type keywords and era
               keywords.  Each expression is a string. (See Strings section.)
               The expressions following the langinfo-type keywords define the
               strings associated with the items in langinfo. Each expression
               consists of a string to be associated with the item identified
               by the keyword.

     Constants

     Constants represent character codes in the ctype, shift, and collate
     expressions. C programming language and character constants can be used
     as character codes, including:  decimal constants, octal constants,
     hexadecimal constants, and character constants.

     Strings

     Strings are used in info expressions. A string is a sequence of zero or
     more characters surrounded by double quotes. Within a string, the
     double-quote character must be preceded by a backslash (\).  The \ can be
     used to include special characters in the message text. These special
     characters are defined as follows:

     \n   Inserts a newline character.

     \t   Inserts a horizontal tab character.

     \b   Inserts a backspace character.

     \r   Inserts a carriage-return character.

     \f   Inserts a formfeed character.

     \\   Inserts a \ (backslash) character.

     \ddd Inserts the single-byte character associated with the octal value
          represented by the valid octal digits ddd. One, two, or three octal
          digits can be specified; however, you must include leading zeros if
          the characters following the octal digits are also valid octal
          digits. For example, the octal value for $ is 44. To display $, use
          \044.

     Metacharacters

     Metacharacters are characters having a special meaning to buildlang in
     ctype, shift, and collate expressions. To escape the metacharacters,
     surround them with single quotes. Included are:


     -         Represents a range of consecutive character codes.

     <         When used with the ul, toupper, and tolower keywords, indicates
               the beginning of an uppercase lowercase character code
               relationship. Inserts a horizontal tab character. When used
               with the sequence keyword, indicates the beginning of a two-
               to-one character pair.

     >         When used with the ul, or toupper keywords, indicates the end
               of an uppercase lowercase character code relationship. When
               used with the sequence keyword, indicates the end of a two-to-
               one character pair.

     [         Indicates the beginning of a one-to-two character pair.

     ]         Indicates the end of a one-to-two character pair.

     (         Indicates the beginning of a group of character code constants
               or expressions having the same collation sequence number, but
               different priorities.

     )         Indicates the end of a group of character code constants or
               expressions having the same collation sequence number, but
               different priorities.

     |         Indicates the beginning of a group of character code constants
               or expressions belonging to the same set of collation don't-
               care characters.

     |         Indicates the end of a group of character code constants or
               expressions belonging to the same set of collation don't-care
               characters.

     Comments

     Comments are all characters between a pound sign (#) and a carriage
     return, except when used in the character code constants and strings.
     Comments and blank lines are ignored.

     Separators

     Separator characters include blanks and tabs. Any number of separators
     can be used to delimit the keywords, metacharacters, constants, and
     strings that comprise a buildlang script.

NOTES
     LC_CTYPE determines the printable characters when the -d option is
     specified.

     If LC_CTYPE is not specified in the environment or is set to the empty
     string, a default of "C" is used instead of LC_CTYPE.

     Single-byte character code sets are supported.

EXAMPLE
     ## Domain_OS defined  @(#) Domain_OS $revision: 66.1 $ SR10.4 $

     # Language:    american
     # Codeset:     ISO88591

     langname  "en_US.iso88591"
     langid         101

     ##################################################
     # LC_ALL category

     LC_ALL
     yesstr         "yes"          # yes string
     nostr          "no"      # no string
     direction ""        # left-to-right orientation
     context        ""
     END_LC

     ##################################################
     # LC_COLLATE category

     LC_COLLATE
     modifier  "fold"         # @modifier: indicates upper and lower case characters
                         # are collated together
     sequence  ' ' 0xa0 '0' - '9'

               ( 'A'  [ 0xc6 'E'  ] 'a'  [ 0xe6 'E'  ] 0xc1 0xe1 0xc0 0xe0 0xc2 0xe2 0xc4 0xe4
               0xc5 0xe5 0xc3 0xe3 )
               ( 'B' 'b' )
               ( 'C' 'c' 0xc7 0xe7 )
               ( 'D' 'd' 0xd0 0xf0 )
               ( 'E' 'e' 0xc9 0xe9 0xc8 0xe8 0xca 0xea 0xcb 0xeb )
               ( 'F' 'f' )
               ( 'G' 'g' )
               ( 'H' 'h' )
               ( 'I' 'i' 0xcd 0xed 0xcc 0xec 0xce 0xee 0xcf 0xef )
               ( 'J' 'j' )
               ( 'K' 'k' )
               ( 'L' 'l' )
               ( 'M' 'm' )
               ( 'N' 'n' 0xd1 0xf1 )
               ( 'O' 'o' 0xd3 0xf3 0xd2 0xf2 0xd4 0xf4 0xd6 0xf6 0xd5 0xf5
               0xd8 0xf8 )
               ( 'P' 'p' )
               ( 'Q' 'q' )
               ( 'R' 'r' )
               ( 'S'  [ 0xdf 'S'  ] 's' )
               ( 'T' 't' )
               ( 'U' 'u' 0xda 0xfa 0xd9 0xf9 0xdb 0xfb 0xdc 0xfc )
               ( 'V' 'v' )
               ( 'W' 'w' )
               ( 'X' 'x' )
               ( 'Y' 'y' 0xdd 0xfd 0xff )
               ( 'Z' 'z' )
               ( 0xde 0xfe ) '(' ')' '[' ']' '{' '}' 0xab 0xbb '<' '>' '`' '''
               '=' '+' '-' 0xd7 0xf7 0xb1 0xac 0xbc 0xbd 0xbe '*' '.'
               ',' ';' ':' '"' 0xbf '?' 0xa1 '!' '/' '' '|' 0xa6
               0xb6 0xa7 '@' '&' 0xb0 '%' '#' '$' 0xa2 0xa3 0xa5 0xa4
               0xb5 '^' '~' 0xb4 0xa8 0xb8 0xb7 0xaf '_' 0xad 0xaa 0xba
               0xb9 0xb2 0xb3 0xa9 0xae 0x0 - 0x1f 0x80 - 0x9f 0x7f

     modifier  "nofold"  # @modifier: indicates upper and lower case characters
                         # are collated separately
     sequence  ' ' 0xa0 '0' - '9'

               ( 'A'  [ 0xc6 'E'  ] 'a'  [ 0xe6 'E'  ] 0xc1 0xe1 0xc0 0xe0 0xc2 0xe2 0xc4 0xe4
               0xc5 0xe5 0xc3 0xe3 )
               ( 'B' 'b' )
               ( 'C' 'c' 0xc7 0xe7 )
               ( 'D' 'd' 0xd0 0xf0 )
               ( 'E' 'e' 0xc9 0xe9 0xc8 0xe8 0xca 0xea 0xcb 0xeb )
               ( 'F' 'f' )
               ( 'G' 'g' )
               ( 'H' 'h' )
               ( 'I' 'i' 0xcd 0xed 0xcc 0xec 0xce 0xee 0xcf 0xef )
               ( 'J' 'j' )
               ( 'K' 'k' )
               ( 'L' 'l' )
               ( 'M' 'm' )
               ( 'N' 'n' 0xd1 0xf1 )
               ( 'O' 'o' 0xd3 0xf3 0xd2 0xf2 0xd4 0xf4 0xd6 0xf6 0xd5 0xf5
               0xd8 0xf8 )
               ( 'P' 'p' )
               ( 'Q' 'q' )
               ( 'R' 'r' )
               ( 'S'  [ 0xdf 'S'  ] 's' )
               ( 'T' 't' )
               ( 'U' 'u' 0xda 0xfa 0xd9 0xf9 0xdb 0xfb 0xdc 0xfc )
               ( 'V' 'v' )
               ( 'W' 'w' )
               ( 'X' 'x' )
               ( 'Y' 'y' 0xdd 0xfd 0xff )
               ( 'Z' 'z' )
               ( 0xde 0xfe ) '(' ')' '[' ']' '{' '}' 0xab 0xbb '<' '>' '`' '''
               '=' '+' '-' 0xd7 0xf7 0xb1 0xac 0xbc 0xbd 0xbe '*' '.'
               ',' ';' ':' '"' 0xbf '?' 0xa1 '!' '/' '' '|' 0xa6
               0xb6 0xa7 '@' '&' 0xb0 '%' '#' '$' 0xa2 0xa3 0xa5 0xa4
               0xb5 '^' '~' 0xb4 0xa8 0xb8 0xb7 0xaf '_' 0xad 0xaa 0xba
               0xb9 0xb2 0xb3 0xa9 0xae 0x0 - 0x1f 0x80 - 0x9f 0x7f
     END_LC


     ##################################################
     # LC_CTYPE category


     LC_CTYPE

     isupper        'A' - 'Z'           # true if an uppercase character
               0xc0 - 0xd6
               0xd8 - 0xdf

     islower        'a' - 'z'           # true if a lowercase character
               0xdf - 0xf6
               0xf8 - 0xff

     isdigit        '0' - '9'           # true if a digit

     isspace        0x9 - 0xd ' ' 0xa0  # true if a space

     ispunct        '!' '"' '#' '$' '%'      # true if a punctuation character
               '&' ''' '(' ')' '*' '+' ',' '-' '.' '/' ':' ';' '<' '='
               '>' '?' '@' '[' '' ']' '^' '_' '`' '{' '|' '}' '~'
               0xa1 - 0xbf 0xd7 0xf7

     iscntrl        0x0 - 0x1f          # true if a control character
               0x7f 0x80 - 0x9f

     isblank        ' ' 0xa0

     isxdigit  '0' - '9'      # true if a hex digit
               'A' - 'F' 'a' - 'f'

     ul                       # <upper lower >
               < 'A' 'a' > < 'B' 'b' > < 'C' 'c' > < 'D' 'd' >
               < 'E' 'e' > < 'F' 'f' > < 'G' 'g' > < 'H' 'h' >
               < 'I' 'i' > < 'J' 'j' > < 'K' 'k' > < 'L' 'l' >
               < 'M' 'm' > < 'N' 'n' > < 'O' 'o' > < 'P' 'p' >
               < 'Q' 'q' > < 'R' 'r' > < 'S' 's' > < 'T' 't' >
               < 'U' 'u' > < 'V' 'v' > < 'W' 'w' > < 'X' 'x' >
               < 'Y' 'y' > < 'Z' 'z' > < 0xc0 0xe0 > < 0xc1 0xe1 >
               < 0xc2 0xe2 > < 0xc3 0xe3 > < 0xc4 0xe4 > < 0xc5 0xe5 >
               < 0xc6 0xe6 > < 0xc7 0xe7 > < 0xc8 0xe8 > < 0xc9 0xe9 >
               < 0xca 0xea > < 0xcb 0xeb > < 0xcc 0xec > < 0xcd 0xed >
               < 0xce 0xee > < 0xcf 0xef > < 0xd0 0xf0 > < 0xd1 0xf1 >
               < 0xd2 0xf2 > < 0xd3 0xf3 > < 0xd4 0xf4 > < 0xd5 0xf5 >
               < 0xd6 0xf6 > < 0xd8 0xf8 > < 0xd9 0xf9 > < 0xda 0xfa >
               < 0xdb 0xfb > < 0xdc 0xfc > < 0xdd 0xfd > < 0xde 0xfe >

     toupper        < 0xff 'Y' >        # special toupper relationship

     bytes_char     "1"            # maximum number of bytes per char
     alt_punct ""             # no alternate punctuation
     END_LC


     ##################################################
     # LC_MONETARY category

     LC_MONETARY
     int_curr_symbol          "USD "
     currency_symbol          "$"
     mon_decimal_point   "."
     mon_thousands_sep   ","
     mon_grouping        " 03"
     positive_sign       ""
     negative_sign       "-"
     int_frac_digits          "2"
     frac_digits         "2"
     p_cs_precedes       "1"
     p_sep_by_space      "0"
     n_cs_precedes       "1"
     n_sep_by_space      "0"
     p_sign_posn         "1"
     n_sign_posn         "1"
     crncystr       "-US$"
     END_LC


     ##################################################
     # LC_NUMERIC category

     LC_NUMERIC
     decimal_point       "."
     thousands_sep       ","
     grouping       " 03"
     alt_digit      ""
     END_LC


     ##################################################
     # LC_TIME category

     LC_TIME
     d_t_fmt        "%a, %b %d, %Y %I:%M:%S %p"
     d_fmt          "%a, %b %d, %Y"
     t_fmt          "%I:%M:%S %p"
     day_1          "Sunday"
     day_2          "Monday"
     day_3          "Tuesday"
     day_4          "Wednesday"
     day_5          "Thursday"
     day_6          "Friday"
     day_7          "Saturday"
     abday_1        "Sun"
     abday_2        "Mon"
     abday_3        "Tue"
     abday_4        "Wed"
     abday_5        "Thu"
     abday_6        "Fri"
     abday_7        "Sat"
     mon_1          "January"
     mon_2          "February"
     mon_3          "March"
     mon_4          "April"
     mon_5          "May"
     mon_6          "June"
     mon_7          "July"
     mon_8          "August"
     mon_9          "September"
     mon_10         "October"
     mon_11         "November"
     mon_12         "December"
     abmon_1        "Jan"
     abmon_2        "Feb"
     abmon_3        "Mar"
     abmon_4        "Apr"
     abmon_5        "May"
     abmon_6        "Jun"
     abmon_7        "Jul"
     abmon_8        "Aug"
     abmon_9        "Sep"
     abmon_10  "Oct"
     abmon_11  "Nov"
     abmon_12  "Dec"
     am_str         "AM"
     pm_str         "PM"
     year_unit ""
     mon_unit  ""
     day_unit  ""
     hour_unit ""
     min_unit  ""
     sec_unit  ""
     era_fmt        ""
     END_LC

ERRORS
     If buildlang detects any error, it terminates with an error message and
     does not generate a locale.def file.

SEE ALSO
     setlocale(3)
















Typewritten Software • bear@typewritten.org • Edmonds, WA 98026