lex(1) lex(1)
NAME
lex - generate scanner
SYNOPSIS
lex [-ctV] [-n|-v] [-Q[o]] [--] [file]
DESCRIPTION
lex generates a C program from a file containing the "lex source
text", which the user has developed for a particular problem. A lex
source text consists of a maximum of three sections: definitions,
rules and user functions. The rules specify which patterns to search
for in an input text, and what actions are to be performed if a pat-
tern is found. It is mandatory that these are specified. The defini-
tions and user functions are optional. This produces the following
structure for the lex source file:
Definitions
[%%
Rules]
[%%
User functions]
Multiple files are treated as a single file. If no files are specified
or - is specified as the filename, standard input is used.
lex generates a file named lex.yy.c. When lex.yy.c is compiled and
linked with the lex library, the generated program copies the standard
input to the standard output, except when a pattern specified in the
file is found. In this latter case, the corresponding program text is
executed. The pattern for which a match has been found is held in the
yytext variable. Matching is done in order of the search patterns in
the input file.
OPTIONS
-c Indicates C actions (as opposed to other programming languages
such as Fortran) and is the default.
-t Causes the program to be written to the standard output, not to
the file lex.yy.c.
-v Provides a two-line summary of statistics.
-n Will not print out the -v summary.
-V Print out version information on standard error.
-Q[o]
Specifies whether version information is to be written to the
output file lex.yy.c
o stands for a yes/no specification in the language environment
set. In an English-language environment -Qy should be specified
Page 1 Reliant UNIX 5.44 Printed 11/98
lex(1) lex(1)
to write version information to lex.yy.c, and -Qn to suppress
this. In a German-language environment, for example, -Qj or -Qn
should be specified.
By default, no version information is output.
-- If the first filename begins with a dash (-), the end of the
command-line options must be marked with --.
Page 2 Reliant UNIX 5.44 Printed 11/98
lex(1) lex(1)
Definitions
Substitutes can be defined in the definitions section. They are speci-
fied in the following format and must be placed at the beginning of
the line:
name substitute
The character string substitute replaces {name} if the name
appears in the rules section.
The type of yytext can be declared in the definitions section:
%array yytext is of type character array
%pointer yytext is of type string pointer
The start states of lex are also defined here:
%sname or %Sname
simple start state
%xname or %Xname
exclusive start state
If the created scanner is in an exclusive state, only patterns speci-
fied for this state are taken into consideration. Simple states, on
the other hand, take all patterns, without state specifications, into
consideration.
Certain default table sizes are too small for some users. The table
sizes for the resulting finite state machine can be set in the defini-
tions section:
%p n number of positions is n (default 2500)
%n n number of states is n (500)
%e n number of parse tree nodes is n (1000)
%a n number of transitions is n (2000)
%k n number of packed character classes is n (2500)
%o n size of output array is n (3000)
The use of one or more of the above automatically implies the -v
option, unless the -n option is used.
Rules
The rules section of the file begins with the delimiting symbol %%. In
this rules section, the user can declare local variables for yylex().
Every line in the rules section which begins with a space or tab char-
acter or enclosed in %{ and %}, and which is positioned before the
first rule, will be copied at the start of the yylex() function,
immediately following the first opening parenthesis.
Page 3 Reliant UNIX 5.44 Printed 11/98
lex(1) lex(1)
Each rule consists of a regular expression, which defines a character
pattern which is to be found, and actions which are to be executed
when the pattern is found. Input text which does not represent a pat-
tern to be found will be transferred by lex into the output file,
unchanged.
lex supports extended regular expressions [see expressions(5)] with
the following exceptions and extensions:
"xy" xy, also if x and/or y are lex operators (excluding \)
^x x at the beginning of the line (only at the start of a pat-
tern)
<y>x or <y1,y2,...>x
x if lex is in the y state or in one of the states y1, y2
etc.
x$ x at the end of the line (only at the end of a pattern)
x/y x if y follows
{xx} Substitution for xx from the definitions section
\octal Character with the octal code octal
\xhexadecimal
Character with the hexadecimal code hexadecimal
\character
character, excluding \character, is one of the following
escape sequences: \\, \a, \b, \f, \n, \r, \t, \v
The precedence with which regular expressions are evaluated deviates
from the standard sequence with regard to some points. The following
table is arranged in descending precedence:
classes / character sets
[==] [::] [..]
quoted characters
\characters
bracketed expressions
[ ]
expressions in ""
"..."
grouping ( )
definitions {name}
Page 4 Reliant UNIX 5.44 Printed 11/98
lex(1) lex(1)
repeat * + ?
concatenation
xy
intervals {m,n}
or |
Page 5 Reliant UNIX 5.44 Printed 11/98
lex(1) lex(1)
In the actions section for a rule, it is possible to carry out special
tasks. lex provides the following macros for this purpose:
input() reads another character from the input stream
unput() pushes back a character, for a later read pass
output() writes a character to the output stream
echo yytext is written to the output stream
The user can redefine these macros to effect his/her own control of
the input/output. However, care must be taken to ensure consistency.
Apart from saving patterns which have been found in yytext, there are
further ways in which the text which is found can be processed using
lex functions:
yymore() Characters which have been newly recognized will be appended
to those already in yytext (normally, yytext will be
overwritten by the next characters which are found).
yyless(n) Only the first n characters in yytext will be taken into
consideration.
REJECT Character strings which overlap, or are partially contained
in another character string will be processed. REJECT jumps
directly to the next rule, without altering the contents of
yytext.
LOCALE
The language of the message texts and yes/no specifications is
governed by the environment variable LCALL, LCMESSAGES or LANG.
When the default is set, the system behaves as if it were not interna-
tionalized, i.e. the message texts are in English and yes/no specifi-
cations must also be made in English (y or n). You must change one of
these variables in order to change the language of the message texts.
In regular expressions the environment variable LCCOLLATE determines
the meaning of character ranges, equivalence classes and character
units, and the environment variable LCCTYPE determines the meaning of
classes. If these variables are not set, the value of LANG is used. If
LANG is not defined or is empty, the system behaves as if it were not
internationalized.
Detailed information on the dependencies of the environment variables
and on internationalization in general can be found in the manual
"Programmer's Guide: Internationalization - Localization". Refer also
to environ(5) for information on setting the user environment.
Page 6 Reliant UNIX 5.44 Printed 11/98
lex(1) lex(1)
SEE ALSO
yacc(1), expressions(5).
Chapter on "lex" in the "Guide to Tools for Programming in C".
Page 7 Reliant UNIX 5.44 Printed 11/98