Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ lex(1) — Reliant UNIX 5.44c4

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

yacc(1)

expressions(5)

lex(1)                                                               lex(1)

NAME
     lex - generate scanner

SYNOPSIS
     lex [-ctV] [-n|-v] [-Q[o]] [--] [file]

DESCRIPTION
     lex generates a C program from a file containing the "lex source
     text", which the user has developed for a particular problem. A lex
     source text consists of a maximum of three sections: definitions,
     rules and user functions. The rules specify which patterns to search
     for in an input text, and what actions are to be performed if a pat-
     tern is found. It is mandatory that these are specified. The defini-
     tions and user functions are optional. This produces the following
     structure for the lex source file:

     Definitions
     [%%
     Rules]
     [%%
     User functions]

     Multiple files are treated as a single file. If no files are specified
     or - is specified as the filename, standard input is used.

     lex generates a file named lex.yy.c. When lex.yy.c is compiled and
     linked with the lex library, the generated program copies the standard
     input to the standard output, except when a pattern specified in the
     file is found. In this latter case, the corresponding program text is
     executed. The pattern for which a match has been found is held in the
     yytext variable. Matching is done in order of the search patterns in
     the input file.

OPTIONS
     -c   Indicates C actions (as opposed to other programming languages
          such as Fortran) and is the default.

     -t   Causes the program to be written to the standard output, not to
          the file lex.yy.c.

     -v   Provides a two-line summary of statistics.

     -n   Will not print out the -v summary.

     -V   Print out version information on standard error.

     -Q[o]
          Specifies whether version information is to be written to the
          output file lex.yy.c

          o stands for a yes/no specification in the language environment
          set. In an English-language environment -Qy should be specified



Page 1                       Reliant UNIX 5.44                Printed 11/98

lex(1)                                                               lex(1)

          to write version information to lex.yy.c, and -Qn to suppress
          this. In a German-language environment, for example, -Qj or -Qn
          should be specified.

          By default, no version information is output.

     --   If the first filename begins with a dash (-), the end of the
          command-line options must be marked with --.














































Page 2                       Reliant UNIX 5.44                Printed 11/98

lex(1)                                                               lex(1)

   Definitions

     Substitutes can be defined in the definitions section. They are speci-
     fied in the following format and must be placed at the beginning of
     the line:

     name substitute
               The character string substitute replaces {name} if the name
               appears in the rules section.

     The type of yytext can be declared in the definitions section:

     %array    yytext is of type character array

     %pointer  yytext is of type string pointer

     The start states of lex are also defined here:

     %sname or %Sname
               simple start state

     %xname or %Xname
               exclusive start state

     If the created scanner is in an exclusive state, only patterns speci-
     fied for this state are taken into consideration. Simple states, on
     the other hand, take all patterns, without state specifications, into
     consideration.

     Certain default table sizes are too small for some users. The table
     sizes for the resulting finite state machine can be set in the defini-
     tions section:

     %p n number of positions is n (default 2500)
     %n n      number of states is n (500)
     %e n      number of parse tree nodes is n (1000)
     %a n      number of transitions is n (2000)
     %k n      number of packed character classes is n (2500)
     %o n      size of output array is n (3000)

     The use of one or more of the above automatically implies the -v
     option, unless the -n option is used.

   Rules

     The rules section of the file begins with the delimiting symbol %%. In
     this rules section, the user can declare local variables for yylex().
     Every line in the rules section which begins with a space or tab char-
     acter or enclosed in %{ and %}, and which is positioned before the
     first rule, will be copied at the start of the yylex() function,
     immediately following the first opening parenthesis.



Page 3                       Reliant UNIX 5.44                Printed 11/98

lex(1)                                                               lex(1)

     Each rule consists of a regular expression, which defines a character
     pattern which is to be found, and actions which are to be executed
     when the pattern is found. Input text which does not represent a pat-
     tern to be found will be transferred by lex into the output file,
     unchanged.

     lex supports extended regular expressions [see expressions(5)] with
     the following exceptions and extensions:

     "xy"      xy, also if x and/or y are lex operators (excluding \)

     ^x        x at the beginning of the line (only at the start of a pat-
               tern)

     <y>x or <y1,y2,...>x
               x if lex is in the y state or in one of the states y1, y2
               etc.

     x$        x at the end of the line (only at the end of a pattern)

     x/y       x if y follows

     {xx}      Substitution for xx from the definitions section

     \octal    Character with the octal code octal

     \xhexadecimal
               Character with the hexadecimal code hexadecimal

     \character
               character, excluding \character, is one of the following
               escape sequences: \\, \a, \b, \f, \n, \r, \t, \v

     The precedence with which regular expressions are evaluated deviates
     from the standard sequence with regard to some points. The following
     table is arranged in descending precedence:

     classes / character sets
                 [==] [::] [..]

     quoted characters
                 \characters

     bracketed expressions
                 [ ]

     expressions in ""
                 "..."

     grouping    ( )

     definitions {name}


Page 4                       Reliant UNIX 5.44                Printed 11/98

lex(1)                                                               lex(1)

     repeat      * + ?

     concatenation
                 xy

     intervals   {m,n}

     or          |














































Page 5                       Reliant UNIX 5.44                Printed 11/98

lex(1)                                                               lex(1)

     In the actions section for a rule, it is possible to carry out special
     tasks. lex provides the following macros for this purpose:

     input() reads another character from the input stream
     unput()   pushes back a character, for a later read pass
     output()  writes a character to the output stream
     echo      yytext is written to the output stream

     The user can redefine these macros to effect his/her own control of
     the input/output. However, care must be taken to ensure consistency.

     Apart from saving patterns which have been found in yytext, there are
     further ways in which the text which is found can be processed using
     lex functions:

     yymore()  Characters which have been newly recognized will be appended
               to those already in yytext (normally, yytext will be
               overwritten by the next characters which are found).

     yyless(n) Only the first n characters in yytext will be taken into
               consideration.

     REJECT    Character strings which overlap, or are partially contained
               in another character string will be processed. REJECT jumps
               directly to the next rule, without altering the contents of
               yytext.

LOCALE
     The language of the message texts and yes/no specifications is
     governed by the environment variable LCALL, LCMESSAGES or LANG.

     When the default is set, the system behaves as if it were not interna-
     tionalized, i.e. the message texts are in English and yes/no specifi-
     cations must also be made in English (y or n). You must change one of
     these variables in order to change the language of the message texts.

     In regular expressions the environment variable LCCOLLATE determines
     the meaning of character ranges, equivalence classes and character
     units, and the environment variable LCCTYPE determines the meaning of
     classes. If these variables are not set, the value of LANG is used. If
     LANG is not defined or is empty, the system behaves as if it were not
     internationalized.

     Detailed information on the dependencies of the environment variables
     and on internationalization in general can be found in the manual
     "Programmer's Guide: Internationalization - Localization". Refer also
     to environ(5) for information on setting the user environment.







Page 6                       Reliant UNIX 5.44                Printed 11/98

lex(1)                                                               lex(1)

SEE ALSO
     yacc(1), expressions(5).

     Chapter on "lex" in the "Guide to Tools for Programming in C".


















































Page 7                       Reliant UNIX 5.44                Printed 11/98

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026