lex
PURPOSE
Generates a C Language program that matches patterns for
simple lexical analysis of an input stream.
SYNOPSIS
lex [ -tvn ] [ file ] ...
DESCRIPTION
The lex command reads file or standard input, generates a
C Language program, and writes it to a file named
lex.yy.c. This file, lex.yy.c, is a compilable C Lan-
guage program.
The lex command uses rules and actions contained in file
to generate a program, lex.yy.c, which can be compiled
with the cc command. It can then receive input, break
the input into the logical pieces defined by the rules in
file, and run program fragments contained in the actions
in file. For a more detailed discussion of lex and its
operation, see AIX Operating System Programming Tools and
Interfaces.
The generated program is a C Language function called
yylex. lex stores yylex in a file named lex.yy.c. You
can use yylex alone to recognize simple, one-word input,
or you can use it with other C Language programs to
perform more difficult input analysis functions. For
example, you can use lex to generate a program that sim-
plifies an input stream before sending it to a parser
program generated by the yacc command.
The function yylex analyzes the input stream using a
program structure called a "finite state machine." This
structure allows the program to exist in only one state
(or condition) at a time. There is a finite number of
states allowed. The rules in file determine how the
program moves from one state to another.
If you do not specify a file, lex reads standard input.
It treats multiple files as a single file.
Note: Since lex uses fixed names for intermediate and
output files, you can have only one lex-generated program
in a given directory.
Input File Format (file)
The input file can contain three sections; definitions,
rules, and user subroutines. Each section must be sepa-
rated from the others by a line containing only the
delimiter, %%. The format is:
definitions
%%
rules
%%
user subroutines
The purpose and format of each are described in the fol-
lowing sections.
DEFINITIONS
: If you want to use variables in your rules, you must
define them in this section. The variables make up the
left column, and their definitions make up the right
column. For example, if you want to define D as a numer-
ical digit, you would write;
D [0-9]
You can use a defined variable in the rules section by
enclosing the variable name in braces ("{D}").
In the definitions section, you can set table sizes for
the resulting finite state machine. The default sizes
are large enough for small programs. You may want to set
larger sizes for more complex programs.
%p n Number of positions is n (default 2000)
%n n Number of states is n (default 500)
%t n Number of parse tree nodes is n (default 1000)
%a n Number of transitions is n (default 3000)
If extended characters appear in regular expression
strings, you may need to reset the output array size with
the %o parameter (possibly to array sizes in the range
10,000 to 20,000). This reset reflects the much larger
number of characters relative to the number of ASCII
characters.
RULES
: Once you have defined your terms, you can write the
rules section. It contains strings and expressions to be
matched in file to yylex, and C commands to execute when
a match is made. This section is required, and it must
be preceded by the delimiter %%, whether or not you have
a definitions section. The lex command does not recog-
nize your rules without this delimiter.
In this section, the left column contains the pattern to
be recognized in an input file to yylex. The right
column contains the C program fragment executed when that
pattern is recognized. Patterns can include extended
characters with one exception: these characters may not
appear in range specifications within character class
expressions surrounded by square brackets. The columns
are separated by a tab. For example, if you want to
search files for the keyword "KEY", you might write:
(KEY)
printf("found KEY");
If you include this rule in file, the lexical analyzer
yylex matches the pattern "KEY" and runs the printf
command.
Each pattern may have a corresponding action, a C command
to execute when the pattern is matched. Each statement
must end with a semicolon. If you use more than one
statement in an action, you must enclose all of them in
braces. A second delimiter, %%, must follow the rules
section if you have a user subroutine section.
When yylex matches a string in the input stream, it
copies the matched file to an external character array,
yytext, before it executes any commands in the rules
section.
You can use the following operators to form patterns that
you want to match:
x Matches the character written. x matches the
literal character x.
[ ] Matches any one character in the enclosed range
([.-.]) or the enclosed list ([...]). [a,b,c,x-z]
matches a,b,c,x,y,or z.
" " Matches the enclosed character or string even if
it is an operator. ""$"" prevents lex from inter-
preting the character "$" as an operator.
\ Acts the same as " ". \"$" also prevents the
shell from interpreting the character "$" as an
operator.
* Matches zero or more occurrences of the character
immediately preceding it. x* matches zero or
more repeated
+ Matches one or more occurrences of the character
immediately preceding it.
? Matches either zero or one occurrences of the
character immediately preceding it.
^ Matches the character only at the beginning of a
line. ^"x" matches an x at the beginning of a
line.
[^] Matches any character but the one following the ^.
[^"x"] matches any character but x.
. Matches any character except the new-line char-
acter.
$ Matches the end of a line.
| Matches either of two characters. "x | y" matches
either x or y.
/ Matches one character only when followed by a
second character. It reads only the first char-
acter into yytext. x/y matches x when it is fol-
lowed by y, and reads x into yytext.
( ) Matches the pattern in the parentheses. This is
used for grouping. It reads the whole pattern
into yytext. A group in parentheses can be used
in place of any single character in any other
pattern. "(xyz123)" matches the pattern "xyz123"
and reads the whole string into yytext.
{} Matches the character as you defined it in the
definitions section. If you defined D to be
numerical digits, "{D}" matches all numerical
digits.
{m,n} Matches m to n occurrences of the character.
x{2,4} matches 2, 3, or 4 occurrences of x.
If a line begins with only a blank, lex copies it to the
output file, lex.yy.c. If the line is in the declara-
tions section of file, lex copies it to the declarations
section of lex.yy.c. If the line is in the rules
section, lex copies it to the program code section of
lex.yy.c.
USER SUBROUTINES
: The lex library has three subroutines defined as
macros, and which you can use in the rules.
input( ) Reads a character from yyin.
unput( ) Replaces a character after it has been
read.
output( ) Writes an output character to yyout.
You can override these three macros by writing your own
code for these routines in the user subroutines section.
But if you write your own, you must undefine these macros
in the definition section as follows:
%{
#undef input
#undef unput
#undef output
}%
There is no main( ) in lex.yy.c because the lex library
contains the main( ) that calls yylex. Therefore, if you
do not include main( ) in the user subroutines section,
when you compile lex.yy.c, you must enter
cc -ll lex.yy.c, where ll will call the lex library.
External names generated by lex all begin with the
preface yy, as in yyin, yyout, yylex, and yytext.
FLAGS
-n Suppresses the statistics summary. When you set your
own table sizes for the finite state machine (see page
), the lex automatically produces this summary if you
do not select this flag.
-t Writes lex.yy.c to standard output instead of to a
file.
-v Provides a one-line summary of the generated finite-
state-machine statistics.
FILES
/usr/lib/libl.a Run-time library.
RELATED INFORMATION
The following command: "yacc."
The description of lex in AIX Operating System Program-
ming Tools and Interfaces.
"Overview of International Character Support" in Managing
the AIX Operating System.