Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ lex(1) — Dell System V Release 4 Issue 2.2

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

yacc(1)



lex(1)   UNIX System V(Extended Software Generation System Utilities)    lex(1)


NAME
      lex - generate programs for simple lexical tasks

SYNOPSIS
      lex [-ctvn -V -Q[y|n]] [file]

DESCRIPTION
      The lex command generates programs to be used in simple lexical analysis
      of text.

      The input files (standard input default) contain strings and expressions
      to be searched for and C text to be executed when these strings are
      found.

      lex generates a file named lex.yy.c.  When lex.yy.c is compiled and
      linked with the lex library, it copies the input to the output except
      when a string specified in the file is found.  When a specified string is
      found, then the corresponding program text is executed.  The actual
      string matched is left in yytext, an external character array.  Matching
      is done in order of the patterns in the file.  The patterns may contain
      square brackets to indicate character classes, as in [abx-z] to indicate
      a, b, x, y, and z; and the operators *, +, and ?  mean, respectively, any
      non-negative number of, any positive number of, and either zero or one
      occurrence of, the previous character or character class.  Thus,
      [a-zA-Z]+ matches a string of letters.  The character .  is the class of
      all ASCII characters except new-line.  Parentheses for grouping and
      vertical bar for alternation are also supported.  The notation r{d,e} in
      a rule indicates between d and e instances of regular expression r.  It
      has higher precedence than |, but lower than *, ?, +, and concatenation.
      The character ^ at the beginning of an expression permits a successful
      match only immediately after a new-line, and the character $ at the end
      of an expression requires a trailing new-line.  The character / in an
      expression indicates trailing context; only the part of the expression up
      to the slash is returned in yytext, but the remainder of the expression
      must follow in the input stream.  An operator character may be used as an
      ordinary symbol if it is within " symbols or preceded by \.

      Three macros are expected:  input() to read a character; unput(c) to
      replace a character read; and output(c) to place an output character.
      They are defined in terms of the standard streams, but you can override
      them.  The program generated is named yylex(), and the lex library
      contains a main() that calls it.  The macros input and output read from
      and write to stdin and stdout, respectively.

      The function yymore accumulates additional characters into the same
      yytext.  The function yyless(n) pushes back yyleng -n characters into the
      input stream.  (yyleng is an external int variable giving the length in
      bytes of yytext.)  The function yywrap is called whenever the scanner
      reaches end of file and indicates whether normal wrapup should continue.
      The action REJECT on the right side of the rule causes the match to be
      rejected and the next suitable match executed.  The action ECHO on the
      right side of the rule is equivalent to printf("%s", yytext).


10/89                                                                    Page 1







lex(1)   UNIX System V(Extended Software Generation System Utilities)    lex(1)


      Any line beginning with a blank is assumed to contain only C text and is
      copied; if it precedes %%, it is copied into the external definition area
      of the lex.yy.c file.  All rules should follow a %%, as in yacc.  Lines
      preceding %% that begin with a non-blank character define the string on
      the left to be the remainder of the line; it can be called out later by
      surrounding it with {}.  In this section, C code (and preprocessor
      statements) can also be included between %{ and %}.  Note that curly
      brackets do not imply parentheses; only string substitution is done.

      The external names generated by lex all begin with the prefix yy or YY.

      The flags must appear before any files.

      -c       Indicates C actions and is the default.

      -t       Causes the lex.yy.c program to be written instead to standard
               output.

      -v       Provides a two-line summary of statistics.

      -n       Will not print out the -v summary.

      -V       Print out version information on standard error.

      -Q[y|n]  Print out version information to output file lex.yy.c by using
               -Qy.  The -Qn option does not print out version information and
               is the default.

      Multiple files are treated as a single file.  If no files are specified,
      standard input is used.

      Certain default table sizes are too small for some users.  The table
      sizes for the resulting finite state machine can be set in the
      definitions section:

            %p n  number of positions is n (default 2500)

            %n n  number of states is n (500)

            %e n  number of parse tree nodes is n (1000)

            %a n  number of transitions is n (2000)

            %k n  number of packed character classes is n (2500)

            %o n  size of output array is n (3000)

      The use of one or more of the above automatically implies the -v option,
      unless the -n option is used.





Page 2                                                                    10/89







lex(1)   UNIX System V(Extended Software Generation System Utilities)    lex(1)


EXAMPLE
              D       [0-9]
              %{
              void
              skipcommnts(void)
              {
                      for(;;)
                      {
                              while(input()!='*')
                                      ;
                              if(input()=='/')
                                      return;
                              else

                                      unput(yytext[yyleng-1]);
                      }
              }
              %}
              %%
              if      printf("IF statement\n");
              [a-z]+  printf("tag, value %s\n",yytext);
              0{D}+   printf("octal number %s\n",yytext);
              {D}+    printf("decimal number %s\n",yytext);
              "++"    printf("unary op\n");
              "+"     printf("binary op\n");
              "\n"    ;/*no action */
              "/*"      skipcommnts();
              %%

SEE ALSO
      yacc(1)
      The ``lex'' chapter in the Programmer's Guide: ANSI C and Programming
      Support Tools





















10/89                                                                    Page 3





Typewritten Software • bear@typewritten.org • Edmonds, WA 98026