Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ lex(1) — Atari System V 1.1-06

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

yacc(1)





   lex(1)        (Extended Software Generation System Utilities)        lex(1)


   NAME
         lex - generate programs for simple lexical tasks

   SYNOPSIS
         lex [-ctvn -V -Q[y|n]] [file]

   DESCRIPTION
         The lex command generates programs to be used in simple lexical
         analysis of text.

         The input files (standard input default) contain strings and
         expressions to be searched for and C text to be executed when these
         strings are found.

         lex generates a file named lex.yy.c.  When lex.yy.c is compiled and
         linked with the lex library, it copies the input to the output except
         when a string specified in the file is found.  When a specified
         string is found, then the corresponding program text is executed.
         The actual string matched is left in yytext, an external character
         array.  Matching is done in order of the patterns in the file.  The
         patterns may contain square brackets to indicate character classes,
         as in [abx-z] to indicate a, b, x, y, and z; and the operators *, +,
         and ?  mean, respectively, any non-negative number of, any positive
         number of, and either zero or one occurrence of, the previous
         character or character class.  Thus, [a-zA-Z]+ matches a string of
         letters.  The character .  is the class of all ASCII characters
         except new-line.  Parentheses for grouping and vertical bar for
         alternation are also supported.  The notation r{d,e} in a rule
         indicates between d and e instances of regular expression r.  It has
         higher precedence than |, but lower than *, ?, +, and concatenation.
         The character ^ at the beginning of an expression permits a
         successful match only immediately after a new-line, and the character
         $ at the end of an expression requires a trailing new-line.  The
         character / in an expression indicates trailing context; only the
         part of the expression up to the slash is returned in yytext, but the
         remainder of the expression must follow in the input stream.  An
         operator character may be used as an ordinary symbol if it is within
         " symbols or preceded by \.

         Three macros are expected:  input() to read a character; unput(c) to
         replace a character read; and output(c) to place an output character.
         They are defined in terms of the standard streams, but you can
         override them.  The program generated is named yylex(), and the lex
         library contains a main() that calls it.  The action REJECT on the
         right side of the rule causes this match to be rejected and the next
         suitable match executed; the function yymore() accumulates additional
         characters into the same yytext; and the function yyless(n) pushes
         back yyleng -n characters into the input stream.  (yyleng is an
         external int variable giving the length of yytext.)  The macros input
         and output use files yyin and yyout to read from and write to,
         defaulted to stdin and stdout, respectively.


   8/91                                                                 Page 1









   lex(1)        (Extended Software Generation System Utilities)        lex(1)


         Any line beginning with a blank is assumed to contain only C text and
         is copied; if it precedes %%, it is copied into the external
         definition area of the lex.yy.c file.  All rules should follow a %%,
         as in yacc.  Lines preceding %% that begin with a non-blank character
         define the string on the left to be the remainder of the line; it can
         be called out later by surrounding it with {}.  In this section, C
         code (and preprocessor statements) can also be included between %{
         and %}.  Note that curly brackets do not imply parentheses; only
         string substitution is done.

   EXAMPLE
                 D       [0-9]
                 %{
                 void
                 skipcommnts(void)
                 {
                         for(;;)
                         {
                                 while(input()!='*')
                                         ;
                                 if(input()=='/')
                                         return;
                                 else

                                         unput(yytext[yyleng-1]);
                         }
                 }
                 %}
                 %%
                 if      printf("IF statement\n");
                 [a-z]+  printf("tag, value %s\n",yytext);
                 0{D}+   printf("octal number %s\n",yytext);
                 {D}+    printf("decimal number %s\n",yytext);
                 "++"    printf("unary op\n");
                 "+"     printf("binary op\n");
                 "\n"    ;/*no action */
                 "/*"      skipcommnts();
                 %%

         The external names generated by lex all begin with the prefix yy or
         YY.

         The flags must appear before any files.

         -c       Indicates C actions and is the default.

         -t       Causes the lex.yy.c program to be written instead to
                  standard output.





   Page 2                                                                 8/91









   lex(1)        (Extended Software Generation System Utilities)        lex(1)


         -v       Provides a two-line summary of statistics.

         -n       Will not print out the -v summary.

         -V       Print out version information on standard error.

         -Q[y|n]  Print out version information to output file lex.yy.c by
                  using -Qy.  The -Qn option does not print out version
                  information and is the default.

         Multiple files are treated as a single file.  If no files are
         specified, standard input is used.

         Certain default table sizes are too small for some users.  The table
         sizes for the resulting finite state machine can be set in the
         definitions section:

               %p n  number of positions is n (default 2500)

               %n n  number of states is n (500)

               %e n  number of parse tree nodes is n (1000)

               %a n  number of transitions is n (2000)

               %k n  number of packed character classes is n (2500)

               %o n  size of output array is n (3000)

         The use of one or more of the above automatically implies the -v
         option, unless the -n option is used.

   SEE ALSO
         yacc(1)
         See the ``lex'' chapter in the Programmer's Guide: ANSI C and
         Programming Support Tools.

















   8/91                                                                 Page 3





Typewritten Software • bear@typewritten.org • Edmonds, WA 98026