Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ lex(1) — svr4 — mips UMIPS RISC/os 5.01

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

yacc(1)



LEX(1-SVR4)         RISC/os Reference Manual          LEX(1-SVR4)



NAME
     lex - generate programs for simple lexical tasks

SYNOPSIS
     lex [-ctvn -V -Q[y|n]] [file]

DESCRIPTION
     The lex command generates programs to be used in simple lex-
     ical analysis of text.

     The input files (standard input default) contain strings and
     expressions to be searched for and C text to be executed
     when these strings are found.

     lex generates a file named lex.yy.c.  When lex.yy.c is com-
     piled and linked with the lex library, it copies the input
     to the output except when a string specified in the file is
     found.  When a specified string is found, then the
     corresponding program text is executed.  The actual string
     matched is left in yytext, an external character array.
     Matching is done in order of the patterns in the file.  The
     patterns may contain square brackets to indicate character
     classes, as in [abx-z] to indicate a, b, x, y, and z; and
     the operators *, +, and ?  mean, respectively, any non-
     negative number of, any positive number of, and either zero
     or one occurrence of, the previous character or character
     class.  Thus, [a-zA-Z]+ matches a string of letters.  The
     character .  is the class of all ASCII characters except
     new-line.  Parentheses for grouping and vertical bar for
     alternation are also supported.  The notation r{d,e} in a
     rule indicates between d and e instances of regular expres-
     sion r.  It has higher precedence than |, but lower than *,
     ?, +, and concatenation.  The character ^ at the beginning
     of an expression permits a successful match only immediately
     after a new-line, and the character $ at the end of an
     expression requires a trailing new-line.  The character / in
     an expression indicates trailing context; only the part of
     the expression up to the slash is returned in yytext, but
     the remainder of the expression must follow in the input
     stream.  An operator character may be used as an ordinary
     symbol if it is within " symbols or preceded by \.

     Three macros are expected:  input() to read a character;
     unput(c) to replace a character read; and output(c) to place
     an output character.  They are defined in terms of the stan-
     dard streams, but you can override them.  The program gen-
     erated is named yylex(), and the lex library contains a
     main() that calls it.  The action REJECT on the right side
     of the rule causes this match to be rejected and the next
     suitable match executed; the function yymore() accumulates
     additional characters into the same yytext; and the function
     yyless(n) pushes back yyleng -n characters into the input



                        Printed 11/19/92                   Page 1





LEX(1-SVR4)         RISC/os Reference Manual          LEX(1-SVR4)



     stream.  (yyleng is an external int variable giving the
     length of yytext.)  The macros input and output use files
     yyin and yyout to read from and write to, defaulted to stdin
     and stdout, respectively.

     Any line beginning with a blank is assumed to contain only C
     text and is copied; if it precedes %%, it is copied into the
     external definition area of the lex.yy.c file.  All rules
     should follow a %%, as in yacc.  Lines preceding %% that
     begin with a non-blank character define the string on the
     left to be the remainder of the line; it can be called out
     later by surrounding it with {}.  In this section, C code
     (and preprocessor statements) can also be included between
     %{ and %}.  Note that curly brackets do not imply
     parentheses; only string substitution is done.

INTERNATIONAL FUNCTIONALITY
     lex can process characters from supplementary code sets as
     well as ASCII characters.

     Characters from supplementary code sets can be specified in
     comments which exist in definitions, rules, and user subrou-
     tines.

     Characters from supplementary code sets can be specified in
     strings which exist in actions in rules and in user subrou-
     tines.

     Character strings from supplementary code sets can be
     defined as tokens.

WARNING
     input(), unput(c) and output(c) functions are performed in
     byte.  The value of yyleng is in bytes, not in characters.

EXAMPLE
             D       [0-9]
             %{
             void
             skipcommnts(void)
             {
                     for(;;)
                     {
                             while(input()!='*')
                                     ;
                             if(input()=='/')
                                     return;
                             else

                                     unput(yytext[yyleng-1]);
                     }
             }



 Page 2                 Printed 11/19/92





LEX(1-SVR4)         RISC/os Reference Manual          LEX(1-SVR4)



             %}
             %%
             if      printf("IF statement\n");
             [a-z]+  printf("tag, value %s\n",yytext);
             0{D}+   printf("octal number %s\n",yytext);
             {D}+    printf("decimal number %s\n",yytext);
             "++"    printf("unary op\n");
             "+"     printf("binary op\n");
             "\n"    ;/*no action */
             "/*"      skipcommnts();
             %%

     The external names generated by lex all begin with the pre-
     fix yy or YY.

     The flags must appear before any files.

     -c       Indicates C actions and is the default.

     -t       Causes the lex.yy.c program to be written instead
              to standard output.

     -v       Provides a two-line summary of statistics.

     -n       Will not print out the -v summary.

     -V       Print out version information on standard error.

     -Q[y|n]  Print out version information to output file
              lex.yy.c by using -Qy.  The -Qn option does not
              print out version information and is the default.

     Multiple files are treated as a single file.  If no files
     are specified, standard input is used.

     Certain default table sizes are too small for some users.
     The table sizes for the resulting finite state machine can
     be set in the definitions section:

          %p n number of positions is n (default 2500)

          %n n number of states is n (500)

          %e n number of parse tree nodes is n (1000)

          %a n number of transitions is n (2000)

          %k n number of packed character classes is n (2500)

          %o n size of output array is n (3000)





                        Printed 11/19/92                   Page 3





LEX(1-SVR4)         RISC/os Reference Manual          LEX(1-SVR4)



     The use of one or more of the above automatically implies
     the -v option, unless the -n option is used.

SEE ALSO
     yacc(1).
     The ``lex'' chapter in the Programmer's Guide: ANSI C and
     Programming Support Tools.
















































 Page 4                 Printed 11/19/92



Typewritten Software • bear@typewritten.org • Edmonds, WA 98026