Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ lex(1) — NEWS-os 5.0.1

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

yacc(1)



lex(1)                   USER COMMANDS                     lex(1)



NAME
     lex - generate programs for simple lexical tasks

SYNOPSIS
     lex [-ctvn -V -Q[y|n]] [file]

DESCRIPTION
     The lex command generates programs to be used in simple lex-
     ical analysis of text.

     The input files (standard input default) contain strings and
     expressions  to  be  searched  for and C text to be executed
     when these strings are found.

     lex generates a file named lex.yy.c.  When lex.yy.c is  com-
     piled  and  linked with the lex library, it copies the input
     to the output except when a string specified in the file  is
     found.    When   a  specified  string  is  found,  then  the
     corresponding program text is executed.  The  actual  string
     matched  is  left  in  yytext,  an external character array.
     Matching is done in order of the patterns in the file.   The
     patterns  may  contain square brackets to indicate character
     classes, as in [abx-z] to indicate a, b, x, y,  and  z;  and
     the  operators  *,  +,  and  ?  mean, respectively, any non-
     negative number of, any positive number of, and either  zero
     or  one  occurrence  of, the previous character or character
     class.  Thus, [a-zA-Z]+ matches a string  of  letters.   The
     character  .   is  the  class of all ASCII characters except
     new-line.  Parentheses for grouping  and  vertical  bar  for
     alternation  are  also  supported.  The notation r{d,e} in a
     rule indicates between d and e instances of regular  expres-
     sion  r.  It has higher precedence than |, but lower than *,
     ?, +, and concatenation.  The character ^ at  the  beginning
     of an expression permits a successful match only immediately
     after a new-line, and the character  $  at  the  end  of  an
     expression requires a trailing new-line.  The character / in
     an expression indicates trailing context; only the  part  of
     the  expression  up  to the slash is returned in yytext, but
     the remainder of the expression must  follow  in  the  input
     stream.   An  operator  character may be used as an ordinary
     symbol if it is within " symbols or preceded by \.

     Three macros are expected:  input()  to  read  a  character;
     unput(c) to replace a character read; and output(c) to place
     an output character.  They are defined in terms of the stan-
     dard  streams,  but you can override them.  The program gen-
     erated is named yylex(), and  the  lex  library  contains  a
     main()  that  calls it.  The action REJECT on the right side
     of the rule causes this match to be rejected  and  the  next
     suitable  match  executed; the function yymore() accumulates
     additional characters into the same yytext; and the function
     yyless(n)  pushes  back  yyleng -n characters into the input



                                                                1





lex(1)                   USER COMMANDS                     lex(1)



     stream.  (yyleng is an  external  int  variable  giving  the
     length  of  yytext.)   The macros input and output use files
     yyin and yyout to read from and write to, defaulted to stdin
     and stdout, respectively.

     Any line beginning with a blank is assumed to contain only C
     text and is copied; if it precedes %%, it is copied into the
     external definition area of the lex.yy.c  file.   All  rules
     should  follow  a  %%,  as in yacc.  Lines preceding %% that
     begin with a non-blank character define the  string  on  the
     left  to  be the remainder of the line; it can be called out
     later by surrounding it with {}.  In this  section,  C  code
     (and  preprocessor  statements) can also be included between
     %{  and  %}.   Note  that  curly  brackets  do   not   imply
     parentheses; only string substitution is done.

EXAMPLE
             D       [0-9]
             %{
             void
             skipcommnts(void)
             {
                     for(;;)
                     {
                             while(input()!='*')
                                     ;
                             if(input()=='/')
                                     return;
                             else

                                     unput(yytext[yyleng-1]);
                     }
             }
             %}
             %%
             if      printf("IF statement\n");
             [a-z]+  printf("tag, value %s\n",yytext);
             0{D}+   printf("octal number %s\n",yytext);
             {D}+    printf("decimal number %s\n",yytext);
             "++"    printf("unary op\n");
             "+"     printf("binary op\n");
             "\n"    ;/*no action */
             "/*"      skipcommnts();
             %%

     The external names generated by lex all begin with the  pre-
     fix yy or YY.

     The flags must appear before any files.

     -c       Indicates C actions and is the default.




                                                                2





lex(1)                   USER COMMANDS                     lex(1)



     -t       Causes the lex.yy.c program to be  written  instead
              to standard output.

     -v       Provides a two-line summary of statistics.

     -n       Will not print out the -v summary.

     -V       Print out version information on standard error.

     -Q[y|n]  Print  out  version  information  to  output   file
              lex.yy.c  by  using  -Qy.   The -Qn option does not
              print out version information and is the default.

     Multiple files are treated as a single file.   If  no  files
     are specified, standard input is used.

     Certain default table sizes are too small  for  some  users.
     The  table  sizes for the resulting finite state machine can
     be set in the definitions section:

          %p n number of positions is n (default 2500)

          %n n number of states is n (500)

          %e n number of parse tree nodes is n (1000)

          %a n number of transitions is n (2000)

          %k n number of packed character classes is n (2500)

          %o n size of output array is n (3000)

     The use of one or more of the  above  automatically  implies
     the -v option, unless the -n option is used.

SEE ALSO
     yacc(1)
     See the ``lex'' chapter in the Programmer's  Guide:  ANSI  C
     and Programming Support Tools.
















                                                                3



Typewritten Software • bear@typewritten.org • Edmonds, WA 98026