Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ lex(1) — bsd — Apollo Domain/OS SR10.4.1

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

yacc(1)

malloc(3)

LEX(1)                               BSD                                LEX(1)



NAME
     lex - generate programs for lexical analysis of text

SYNOPSIS
     lex [-ctvn] [-Xsecondaryn...]  [file] ...

DESCRIPTION
     lex generates programs to be used in simple lexical analysis of text.

     The input files contain strings and expressions to be searched for, and C
     text to be executed when strings are found.  Multiple files are treated
     as a single file.  If no files are specified, the standard input is used.

     A file lex.yy.c is generated which, when loaded with the library, copies
     the input to the output except when a string specified in the file is
     found; then the corresponding program text is executed.  The actual
     string matched is left in yytext, an external character array.  Matching
     is done in order of the strings in the file.  The strings can contain
     square brackets to indicate character classes, as in [abx-z] to indicate
     a, b, x, y, and z; and the operators *, +, and ?  mean respectively any
     non-negative number of, any positive number of, and either zero or one
     occurrences of, the previous character or character class.  The character
     . is the class of all ASCII characters except new-line.  Parentheses for
     grouping and vertical bar for alternation are also supported.  The
     notation r{d,e} in a rule indicates between d and e instances of regular
     expression r.  It has higher precedence than |, but lower than *, ?, +,
     and concatenation.  The character ^ at the beginning of an expression
     permits a successful match only immediately after a new-line, and the
     character $ at the end of an expression requires a trailing new-line.
     The character / in an expression indicates trailing context; only the
     part of the expression up to the slash is returned in yytext, but the
     remainder of the expression must follow in the input stream.  An operator
     character may be used as an ordinary symbol if it is enclosed between
     double quotes ( or preceded by \.  Thus [a-zA-Z]+ matches a string of
     letters.

     Three subroutines defined as macros are expected:  input() to read a
     character; unput(c) to replace a character read; and output(c) to place
     an output character.  They are defined in terms of the standard streams,
     but can be overridden.  The program generated is named yylex(), and the
     library contains a main() which calls it.  The action REJECT on the right
     side of the rule causes this match to be rejected and the next suitable
     match executed; the function yymore() accumulates additional characters
     into the same yytext; and the function yyless(p) pushes back the portion
     of the string matched beginning at p, which should be between yytext and
     yytext+yyleng.  The macros input and output use files yyin and yyout to
     read from and write to, defaulted to the standard input and the standard
     output, respectively.

     Any line beginning with a blank is assumed to contain only C text and is
     copied; if it precedes %% it is copied into the external definition area
     of the lex.yy.c file.  All rules should follow a %%, as in yacc(1).
     Lines preceding %% that begin with a non-blank character define the
     string on the left to be the remainder of the line; it can be called out
     later by surrounding it with {}.  Note that curly brackets do not imply
     parentheses; only string substitution is done.

     The flags, which must appear before any files, are as follows:

          -c        indicates C actions - this is the default;

          -t        causes the lex.yy.c program to be written instead to the
                    standard output;

          -v        provides a one-line summary of statistics for the machine
                    generated;

          -n        suppresses printing of the - summary.

     The -Xsecondaryn option allows the sizes of certain internal lex tables
     to be reset.  secondary is one of the letters from the set {d D s S a c}
     and specifies the table; n is the new size.  Tables whose size can be
     changed by using secondary letters are:

          d         table of definitions; default = 200.

          D         table of characters in definition strings; default = 5000.

          s         table of start conditions; default = 50.

          S         table of characters in start condition names; default =
                    500.

          c         array table for storing character classes; default = 1000.

          a         right context/action array table; default = 100.

     If an array overflows, lex issues a fatal error message including a
     suggestion of which table to reset.  For example:

          Definitions too long, try -XD option

     Certain table sizes for the resulting finite state machine can be set in
     the definitions section:

          %p n      number of positions is n (default is 2500);

          %q n      number of positions for one state is n (default is 300);

          %n n      number of states is n (default is 500);

          %e n      number of parse tree nodes is n (default is 1000);

          %a n      number of transitions is n (default is 2000).

          %k n      number of packed character classes is n (default is 1000);

          %o n      size of output array is n (default is 3000);

     The use of one or more of the preceding table options automatically
     implies -v unless -n is specified.

     External names generated by lex all begin with the prefix yy or  YY .

EXTERNAL INFLUENCES
   International Code Set Support
     Single-byte character code sets are supported.

EXAMPLES
             D       [0-9]
             %%
             if      printf("IF statement\n");
             [a-z]+  printf("tag, value %s\n",yytext);
             0{D}+   printf("octal number %s\n",yytext);
             {D}+    printf("decimal number %s\n",yytext);
             "++"    printf("unary op\n");
             "+"     printf("binary op\n");
             "/*"    {       loop:
                             while (input() != '*');
                             switch (input())
                                     {
                                     case '/': break;
                                     case '*': unput('*');
                                     default: goto loop;
                                     }
                             }

WARNINGS
     The token buffer in the program built by lex is of fixed length,

          yytext[YYLMAX]

     where YYLMAX is defined to be 200 characters.  Overflow of this array is
     not detected in the lex.yy.c program.

SEE ALSO
     yacc(1), malloc(3).

FILES
     /usr/lib/lex/ncform
     lex.yy.c
     /usr/lib/libl.a

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026