Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ lex(1) — A/UX 2.0

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

awk(1)

grep(1)

sed(1)

yacc(1)

malloc(3X)




lex(1) lex(1)
NAME lex - generate programs for simple lexical tasks SYNOPSIS lex [-c] [-n] [-t] [-v] [file] ... DESCRIPTION lex generates programs to be used in simple lexical analysis of text. The input files (standard input default) contain strings and expressions to be searched for, and C text to be executed when strings are found. A file lex.yy.c is generated which, when loaded with the li- brary, copies the input to the output except when a string specified in the file is found; then the corresponding pro- gram text is executed. The actual string matched is left in yytext, an external character array. Matching is done in order of the strings in the file. The strings may contain square brackets to indicate character classes, as in [abx-z] to indicate a, b, x, y, and z; and the operators *, +, and ? mean, respectively, any nonnegative number of, any positive number of, and either zero or one occurrences of, the previ- ous character or character class. Thus [a-zA-Z]+ matches a string of letters. The character . is the class of all ASCII characters except newline. Parentheses for grouping and vertical bar for alternation are also supported. The notation r{d,e} in a rule indicates between d and e in- stances of regular expression r. It has higher precedence than |, but lower than *, ?, +, and concatenation. The character ^ at the beginning of an expression permits a suc- cessful match only immediately after a newline, and the character $ at the end of an expression requires a trailing newline. The character / in an expression indicates trail- ing context; only the part of the expression up to the slash is returned in yytext, but the remainder of the expression must follow in the input stream. An operator character may be used as an ordinary symbol if it is within " symbols or preceded by \ . Three subroutines defined as macros are expected: input() to read a character; unput(c) to replace a character read; and output(c) to place an output character. They are de- fined in terms of the standard streams, but you can override them. The program generated is named yylex(), and the li- brary contains a main() which calls it. The action REJECT on the right side of the rule causes this match to be re- jected and the next suitable match executed; the function yymore() accumulates additional characters into the same yytext; and the function yyless(p) pushes back the portion of the string matched beginning at p, which should be April, 1990 1



lex(1) lex(1)
between yytext and yytext+yyleng. The macros input and out- put use files yyin and yyout to read from and write to, de- faulted to stdin and stdout, respectively. Any line beginning with a blank is assumed to contain only C text and is copied; if it precedes %%, it is copied into the external definition area of the lex.yy.c file. All rules should follow a %%, as in YACC. Lines preceding %% which begin with a nonblank character define the string on the left to be the remainder of the line; it can be called out later by surrounding it with {}. Note that curly brackets do not imply parentheses; only string substitution is done. The external names generated by lex all begin with the pre- fix yy or YY. The flags must appear before any files. The -c flag option indicates C actions and is the default, -t causes the lex.yy.c program to be written instead to standard output, -v provides a one-line summary of statistics of the machine generated, -n will not print out the summary. Multiple files are treated as a single file. If no files are speci- fied, standard input is used. Certain table sizes for the resulting finite state machine can be set in the definitions section: %p n number of positions is n (default 2000) %n n number of states is n (500) %t n number of parse tree nodes is n (1000) %a n number of transitions is n (3000) The use of one or more of the above automatically implies the -v flag option, unless the -n flag option is used. EXAMPLES D [0-9] %% if printf("IF statement\n"); [a-z]+ printf("tag, value %s\n",yytext); 0{D}+ printf("octal number %s\n",yytext); {D}+ printf("decimal number %s\n",yytext); "++" printf("unary op\n"); "+" printf("binary op\n"); "/*" { loop: while (input() != '*'); switch (input()) { case '/': break; case '*': unput('*'); default: go to loop; } 2 April, 1990



lex(1) lex(1)
} FILES /usr/bin/lex SEE ALSO awk(1), grep(1), sed(1), yacc(1), malloc(3X). ``lex Reference'' in the A/UX Programming Languages and Tools, Volume 2. BUGS When given an illegal flag option, lex reports the fact that it has been given an illegal flag option but then continues to execute with the default options, rather than stopping the execution and printing a usage statement. April, 1990 3

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026