lex(1) USER COMMANDS lex(1)
NAME
lex - generate programs for simple lexical tasks
SYNOPSIS
lex [-ctvn -V -Q[y|n]] [file]
DESCRIPTION
The lex command generates programs to be used in simple lex-
ical analysis of text.
The input files (standard input default) contain strings and
expressions to be searched for and C text to be executed
when these strings are found.
lex generates a file named lex.yy.c. When lex.yy.c is com-
piled and linked with the lex library, it copies the input
to the output except when a string specified in the file is
found. When a specified string is found, then the
corresponding program text is executed. The actual string
matched is left in yytext, an external character array.
Matching is done in order of the patterns in the file. The
patterns may contain square brackets to indicate character
classes, as in [abx-z] to indicate a, b, x, y, and z; and
the operators *, +, and ? mean, respectively, any non-
negative number of, any positive number of, and either zero
or one occurrence of, the previous character or character
class. Thus, [a-zA-Z]+ matches a string of letters. The
character . is the class of all ASCII characters except
new-line. Parentheses for grouping and vertical bar for
alternation are also supported. The notation r{d,e} in a
rule indicates between d and e instances of regular expres-
sion r. It has higher precedence than |, but lower than *,
?, +, and concatenation. The character ^ at the beginning
of an expression permits a successful match only immediately
after a new-line, and the character $ at the end of an
expression requires a trailing new-line. The character / in
an expression indicates trailing context; only the part of
the expression up to the slash is returned in yytext, but
the remainder of the expression must follow in the input
stream. An operator character may be used as an ordinary
symbol if it is within " symbols or preceded by \.
Three macros are expected: input() to read a character;
unput(c) to replace a character read; and output(c) to place
an output character. They are defined in terms of the stan-
dard streams, but you can override them. The program gen-
erated is named yylex(), and the lex library contains a
main() that calls it. The action REJECT on the right side
of the rule causes this match to be rejected and the next
suitable match executed; the function yymore() accumulates
additional characters into the same yytext; and the function
yyless(n) pushes back yyleng -n characters into the input
1
lex(1) USER COMMANDS lex(1)
stream. (yyleng is an external int variable giving the
length of yytext.) The macros input and output use files
yyin and yyout to read from and write to, defaulted to stdin
and stdout, respectively.
Any line beginning with a blank is assumed to contain only C
text and is copied; if it precedes %%, it is copied into the
external definition area of the lex.yy.c file. All rules
should follow a %%, as in yacc. Lines preceding %% that
begin with a non-blank character define the string on the
left to be the remainder of the line; it can be called out
later by surrounding it with {}. In this section, C code
(and preprocessor statements) can also be included between
%{ and %}. Note that curly brackets do not imply
parentheses; only string substitution is done.
EXAMPLE
D [0-9]
%{
void
skipcommnts(void)
{
for(;;)
{
while(input()!='*')
;
if(input()=='/')
return;
else
unput(yytext[yyleng-1]);
}
}
%}
%%
if printf("IF statement\n");
[a-z]+ printf("tag, value %s\n",yytext);
0{D}+ printf("octal number %s\n",yytext);
{D}+ printf("decimal number %s\n",yytext);
"++" printf("unary op\n");
"+" printf("binary op\n");
"\n" ;/*no action */
"/*" skipcommnts();
%%
The external names generated by lex all begin with the pre-
fix yy or YY.
The flags must appear before any files.
-c Indicates C actions and is the default.
2
lex(1) USER COMMANDS lex(1)
-t Causes the lex.yy.c program to be written instead
to standard output.
-v Provides a two-line summary of statistics.
-n Will not print out the -v summary.
-V Print out version information on standard error.
-Q[y|n] Print out version information to output file
lex.yy.c by using -Qy. The -Qn option does not
print out version information and is the default.
Multiple files are treated as a single file. If no files
are specified, standard input is used.
Certain default table sizes are too small for some users.
The table sizes for the resulting finite state machine can
be set in the definitions section:
%p n number of positions is n (default 2500)
%n n number of states is n (500)
%e n number of parse tree nodes is n (1000)
%a n number of transitions is n (2000)
%k n number of packed character classes is n (2500)
%o n size of output array is n (3000)
The use of one or more of the above automatically implies
the -v option, unless the -n option is used.
SEE ALSO
yacc(1)
See the ``lex'' chapter in the Programmer's Guide: ANSI C
and Programming Support Tools.
3