Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ lex(1) — sysv — mips UMIPS RISC/os 4.52

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

yacc(1)



LEX(1-SysV)         RISC/os Reference Manual          LEX(1-SysV)



NAME
     lex - generate programs for simple lexical tasks

SYNOPSIS
     lex [ -rctvn ] [ file ] ...

DESCRIPTION
     The lex command generates programs to be used in simple lex-
     ical analysis of text.

     The input files (standard input default) contain strings and
     expressions to be searched for, and C text to be executed
     when strings are found.

     A file lex.yy.c is generated which, when loaded with the
     library, copies the input to the output except when a string
     specified in the file is found; then the corresponding pro-
     gram text is executed.  The actual string matched is left in
     yytext, an external character array.  Matching is done in
     order of the strings in the file.  The strings may contain
     square brackets to indicate character classes, as in [abx-z]
     to indicate a, b, x, y, and z; and the operators *, +, and ?
     mean respectively any non-negative number of, any positive
     number of, and either zero or one occurrence of, the previ-
     ous character or character class.  The character . is the
     class of all ASCII characters except new-line.  Parentheses
     for grouping and vertical bar for alternation are also sup-
     ported.  The notation r{d,e} in a rule indicates between d
     and e instances of regular expression r.  It has higher pre-
     cedence than |, but lower than *, ?, +, and concatenation.
     Thus [a-zA-Z]+ matches a string of letters.  The character ^
     at the beginning of an expression permits a successful match
     only immediately after a new-line, and the character $ at
     the end of an expression requires a trailing new-line.  The
     character / in an expression indicates trailing context;
     only the part of the expression up to the slash is returned
     in yytext, but the remainder of the expression must follow
     in the input stream.  An operator character may be used as
     an ordinary symbol if it is within " symbols or preceded by
     \.

     Three subroutines defined as macros are expected:  input()
     to read a character; unput(c) to replace a character read;
     and output(c) to place an output character.  They are
     defined in terms of the standard streams, but you can over-
     ride them.  The program generated is named yylex(), and the
     library contains a main() which calls it.  The action REJECT
     on the right side of the rule causes this match to be
     rejected and the next suitable match executed; the function
     yymore() accumulates additional characters into the same
     yytext; and the function yyless(p) pushes back the portion
     of the string matched beginning at p, which should be



                         Printed 1/15/91                   Page 1





LEX(1-SysV)         RISC/os Reference Manual          LEX(1-SysV)



     between yytext and yytext+yyleng.  The macros input and out-
     put use files yyin and yyout to read from and write to,
     defaulted to stdin and stdout, respectively.

     Any line beginning with a blank is assumed to contain only C
     text and is copied; if it precedes %% it is copied into the
     external definition area of the lex.yy.c file.  All rules
     should follow a %%, as in YACC. Lines preceding %% which
     begin with a non-blank character define the string on the
     left to be the remainder of the line; it can be called out
     later by surrounding it with {}.  Note that curly brackets
     do not imply parentheses; only string substitution is done.

EXAMPLE
             D       [0-9]
             %%
             if      printf("IF statement\n");
             [a-z]+  printf("tag, value %s\n",yytext);
             0{D}+   printf("octal number %s\n",yytext);
             {D}+    printf("decimal number %s\n",yytext);
             "++"    printf("unary op\n");
             "+"     printf("binary op\n");
             "/*"      skipcommnts();
             %%
              skipcommnts()
              {
                     for (;;)
                     {
                             while (input() != '*')
                                     ;
                             if (input() != '/')
                                     unput(yytext[yyleng-1]);
                           } else {
                                     return;
                     }
              }

     The external names generated by lex all begin with the pre-
     fix yy or YY.

     The flags must appear before any files.  The flag -r indi-
     cates RATFOR actions, -c indicates C actions and is the
     default, -t causes the lex.yy.c program to be written
     instead to standard output, -v provides a one-line summary
     of statistics, -n will not print out the -v summary.  Multi-
     ple files are treated as a single file.  If no files are
     specified, standard input is used.

     Certain table sizes for the resulting finite state machine
     can be set in the definitions section:

          %p n number of positions is n (default 2500)



 Page 2                  Printed 1/15/91





LEX(1-SysV)         RISC/os Reference Manual          LEX(1-SysV)



          %n n number of states is n (500)

          %e n number of parse tree nodes is n (1000)

          %a n number of transitions is n (2000)

          %k n number of packed character classes is n (1000)

          %o n size of output array is n (3000)

     The use of one or more of the above automatically implies
     the -v option, unless the -n option is used.

SEE ALSO
     yacc(1).
     Programmer's Guide.

ERRORS
     The -r option is not yet fully operational.




































                         Printed 1/15/91                   Page 3



Typewritten Software • bear@typewritten.org • Edmonds, WA 98026