LEX(1,C) AIX Commands Reference LEX(1,C)
-------------------------------------------------------------------------------
lex
PURPOSE
Generates a C language program that matches patterns for simple lexical
analysis of an input stream.
SYNTAX
+------+ +--------+ +--------+
lex ---| |---| one of |---| |---|
+- -t -+ | +----+ | +- file -+
+-| -n |-+ ^ |
| -v | +------+
+----+
Note: This command does not have MBCS support.
DESCRIPTION
The lex command reads file or standard input, generates a C Language program,
and writes it to a file named lex.yy.c. This file, lex.yy.c, is a compilable C
Language program.
The lex command uses rules and actions contained in file to generate a program,
lex.yy.c, which can be compiled with the cc command. It can then receive
input, break the input into the logical pieces defined by the rules in file,
and run program fragments contained in the actions in file. For a more
detailed discussion of the lex command and its operation, see AIX Operating
System Programming Tools and Interfaces.
The generated program is a C Language function called yylex. The lex command
stores the yylex function in a file named lex.yy.c. You can use the yylex
function alone to recognize simple, one-word input, or you can use it with
other C Language programs to perform more difficult input analysis functions.
For example, you can use the lex command to generate a program that simplifies
an input stream before sending it to a parser program generated by the yacc
command.
The function yylex analyzes the input stream using a program structure called a
"finite state machine". This structure allows the program to exist in only one
state (or condition) at a time. There is a finite number of states allowed.
The rules in file determine how the program moves from one state to another.
If you do not specify a file, the lex command reads standard input. It treats
multiple files as a single file.
Processed November 8, 1990 LEX(1,C) 1
LEX(1,C) AIX Commands Reference LEX(1,C)
Note: Since the lex command uses fixed names for intermediate and output
files, you can have only one lex command-generated program in a given
directory.
Input File Format (file)
The input file can contain three sections: definitions, rules, and user
subroutines. Each section must be separated from the others by a line
containing only the delimiter, %%. Format is:
definitions
%%
rules
%%
user subroutines
The purpose and format of each section are described in the following sections.
DEFINITIONS
If you want to use variables in your rules, you must define them in this
section. The are put in the left column, and their definitions are put in the
right column. For example, if you wanted to define "D" as a numerical digit,
you would write;
D [0-9]
You can use a defined variable in the rules section by enclosing the variable
name in braces ("{D}").
In the definitions section, you can set table sizes for the resulting finite
state machine. The default sizes are large enough for small programs. You may
want to set larger sizes for more complex programs.
%p n Number of positions is n (default 2000)
%n n Number of states is n (default 500)
%e n Number of parse tree nodes is n (default 1000)
%a n Number of transitions is n (default 3000)
If extended characters appear in regular expression strings, you may need to
reset the output array size with the %o parameter (possibly to array sizes in
the range 10,000 to 20,000). This reset reflects the much larger number of
characters relative to the number of ASCII characters.
RULES
Once you have defined your terms, you can write the rules section. It contains
strings and expressions to be matched in file to the yylex function, and C
commands to execute when a match is made. This section is required, and it
Processed November 8, 1990 LEX(1,C) 2
LEX(1,C) AIX Commands Reference LEX(1,C)
must be preceded by the delimiter %%, whether you have a definitions section.
The lex command does not recognize your rules without this delimiter.
In this section, the left column contains the pattern to be recognized in an
input file to the yylex function. The right column contains the C program
fragment that is executed when that pattern is recognized. Patterns can
include extended characters with one exception: these characters may not
appear in range specifications within character class expressions surrounded by
square brackets. The columns are separated by a tab. For example, if you
wanted to search files for the keyword "KEY", you might write:
(KEY) printf("found KEY");
If you include this rule in file, the lexical analyzer yylex matches the
pattern "KEY" and runs the printf command.
Each pattern may have a corresponding action, a C command to execute when the
pattern is matched. Each statement must end with a semicolon. If you use more
than one statement in an action, you must enclose all of them in braces. A
second delimiter, %%, must follow the rules section if you have a user
subroutine section.
When the yylex function matches a string in the input stream, it copies the
matched file to an external character array, yytext, before it executes any
commands in the rules section.
You can use the following operators to form patterns that you want to match:
x Matches the character written. The x matches the literal character x.
[ ] Matches any one character in the enclosed range ([.-.]) or the enclosed
list ([...]). For example, [a,b,c,x-z] matches a,b,c,x,y, or z.
" " Matches the enclosed character or string even if it is an operator. For
example, ""$"" prevents lex from interpreting the character "$" as an
operator.
\ Acts the same as " ". For example, \"$" also prevents the shell from
interpreting the character "$" as an operator.
* Matches zero or more occurrences of the character immediately preceding
it. For example, x* matches zero or more repeated x's.
+ Matches one or more occurrences of the character immediately preceding
it.
? Matches either zero or one occurrences of the character immediately
preceding it.
^ Matches the character only at the beginning of a line. ^"x" matches an
x at the beginning of a line.
Processed November 8, 1990 LEX(1,C) 3
LEX(1,C) AIX Commands Reference LEX(1,C)
[^] Matches any character but the one following the ^. For example, [^"x"]
matches any character but x.
. Matches any character except the new-line character.
$ Matches the end of a line.
| Matches either of two characters. For example, "x | y" matches either x
or y.
/ Matches one character only when followed by a second character. It
reads only the first character into the yytext character array. For
example, x/y matches x when it is followed by y, and reads x into
yytext.
( ) Matches the pattern in the parentheses. This operator is used for
grouping. The parentheses reads the whole pattern into yytext. A group
in parentheses can be used in place of any single character in any other
pattern. For example, "(xyz123)" matches the pattern "xyz123" and reads
the whole string into yytext.
{} Matches the character as you defined it in the definitions section. For
example, you defined "D" to be numerical digits, "{D}" matches all
numerical digits.
{m,n} Matches m to n occurrences of the character. For example, x{2,4}
matches 2, 3, or 4 occurrences of x.
If a line begins with only a blank, the lex command copies the line to the
output file, lex.yy.c. If the line is in the declarations section of file, the
lex copies the line to the declarations section of lex.yy.c. If the line is in
the rules section, the lex command copies the line to the program code section
of lex.yy.c.
USER
The lex library has three subroutines defined as macros, and which you can use
in the rules.
input( ) Reads a character from yyin.
unput( ) Replaces a character after it has been read.
output( ) Writes an output character to yyout.
You can override these three macros by writing your own code for these routines
in the user subroutines section. But if you write your own, you must undefine
these macros in the definition section as follows:
Processed November 8, 1990 LEX(1,C) 4
LEX(1,C) AIX Commands Reference LEX(1,C)
%{
#undef input
#undef unput
#undef output
}%
There is no main( ) in the file lex.yy.c because the lex library contains the
main( ) that calls the lexical analyzer yylex. Therefore, if you do not
include main( ) in the user subroutines section, when you compile the file
lex.yy.c, you must enter "cc -ll lex.yy.c", where ll calls the lex library.
External names generated by the lex command all begin with the prefix yy, as in
yyin, yyout, yylex, and yytext.
FLAGS
-n Suppresses the statistics summary. When you set your own table sizes for
the finite state machine (see page 2), the lex command automatically
produces this summary if you do not select this flag.
-t Writes the file lex.yy.c to standard output instead of to a file.
-v Provides a one-line summary of the generated finite-state-machine
statistics.
FILES
/usr/lib/libl.a Run-time library.
RELATED INFORMATION
See the following commmand: "yacc."
See the description of the lex command and "Programming for an MBCS
Environment" in AIX Operating System Programming Tools and Interfaces.
See "Introduction to International Character Support" in Managing the AIX
Operating System.
Processed November 8, 1990 LEX(1,C) 5