Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ lex(1) — UnixWare 2.01

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

yacc(1)






       lex(1)                                                        lex(1)


       NAME
             lex - generate programs for simple lexical tasks

       SYNOPSIS
             lex [-ctvn -V -Q[y|n]] [file]

       DESCRIPTION
             The lex command generates programs to be used in simple
             lexical analysis of text.  The input files (standard input
             default) contain strings and expressions to be searched for
             and C text to be executed when these strings are found.  lex
             processes supplementary code set characters in program
             comments and strings, and single-byte supplementary code set
             characters in tokens, according to the locale specified in the
             LC_CTYPE environment variable [see LANG on environ(5)].

             lex generates a file named lex.yy.c.  When lex.yy.c is
             compiled and linked with the lex library, it copies the input
             to the output except when a string specified in the file is
             found.  When a specified string is found, then the
             corresponding program text is executed.  The actual string
             matched is left in yytext, an external character array.
             Matching is done in order of the patterns in the file.  The
             patterns may contain square brackets to indicate character
             classes, as in [abx-z] to indicate a, b, x, y, and z; and the
             operators *, +, and ?  mean, respectively, any non-negative
             number of, any positive number of, and either zero or one
             occurrence of, the previous character or character class.
             Thus, [a-zA-Z]+ matches a string of letters.  The character .
             is the class of all characters except new-line.  Parentheses
             for grouping and vertical bar for alternation are also
             supported.  The notation r{d,e} in a rule indicates between d
             and e instances of regular expression r.  It has higher
             precedence than |, but lower than *, ?, +, and concatenation.
             The character ^ at the beginning of an expression permits a
             successful match only immediately after a new-line, and the
             character $ at the end of an expression requires a trailing
             new-line.  The character / in an expression indicates trailing
             context; only the part of the expression up to the slash is
             returned in yytext, but the remainder of the expression must
             follow in the input stream.  An operator character may be used
             as an ordinary symbol if it is within " symbols or preceded by
             \.





                           Copyright 1994 Novell, Inc.               Page 1













      lex(1)                                                        lex(1)


            Three macros are expected: input to read a character; unput(c)
            to replace a character read; and output(c) to place an output
            character.  They are defined in terms of the standard streams,
            but you can override them.  The program generated is named
            yylex, and the lex library contains a main that calls it.  The
            macros input and output read from and write to stdin and
            stdout, respectively.

            The function yymore accumulates additional characters into the
            same yytext.  The function yyless(n) pushes back yyleng -n
            characters into the input stream.  (yyleng is an external int
            variable giving the length in bytes of yytext.)  The function
            yywrap is called whenever the scanner reaches end of file and
            indicates whether normal wrapup should continue.  The action
            REJECT on the right side of the rule causes the match to be
            rejected and the next suitable match executed.  The action
            ECHO on the right side of the rule is equivalent to
            printf("%s", yytext).

            Any line beginning with a blank is assumed to contain only C
            text and is copied; if it precedes %%, it is copied into the
            external definition area of the lex.yy.c file.  All rules
            should follow a %%, as in yacc.  Lines preceding %% that begin
            with a non-blank character define the string on the left to be
            the remainder of the line; it can be called out later by
            surrounding it with {}.  In this section, C code (and
            preprocessor statements) can also be included between %{ and
            %}.  Note that curly brackets do not imply parentheses; only
            string substitution is done.

            The external names generated by lex all begin with the prefix
            yy or YY.

            The flags must appear before any files.

            -c       Indicates C actions and is the default.

            -t       Causes the lex.yy.c program to be written instead to
                     standard output.

            -v       Provides a two-line summary of statistics.

            -n       Will not print out the -v summary.





                          Copyright 1994 Novell, Inc.               Page 2













       lex(1)                                                        lex(1)


             -V       Print out version information on standard error.

             -Q[y|n]  Print out version information to output file lex.yy.c
                      by using -Qy.  The -Qn option does not print out
                      version information and is the default.

             Multiple files are treated as a single file.  If no files are
             specified, standard input is used.

             Certain default table sizes are too small for some users.  The
             table sizes for the resulting finite state machine can be set
             in the definitions section:

                   %p n  number of positions is n (default 20000)

                   %n n  number of states is n (4000)

                   %e n  number of parse tree nodes is n (8000)

                   %a n  number of transitions is n (16000)

                   %k n  number of packed character classes is n (20000)

                   %o n  size of output array is n (24000)

             The use of one or more of the above automatically implies the
             -v option, unless the -n option is used.

       EXAMPLES
                     D       [0-9]
                     O       [0-7]
                     %{
                     void
                     skipcommnts(void)
                     {
                             for(;;)
                             {
                                     while(input()!='*')
                                             ;
                                     if(input()=='/')
                                             return;
                                     else

                                             unput(yytext[yyleng-1]);
                             }
                     }


                           Copyright 1994 Novell, Inc.               Page 3













      lex(1)                                                        lex(1)


                    %}
                    %%
                    if      printf("IF statement\n");
                    [a-z]+  printf("tag, value %s\n",yytext);
                    0{O}+   printf("octal number %s\n",yytext);
                    {D}+    printf("decimal number %s\n",yytext);
                    "++"    printf("unary op\n");
                    "+"     printf("binary op\n");
                    "\n"    ;/*no action */
                    "/*"      skipcommnts();
                    %%

      REFERENCES
            yacc(1)


































                          Copyright 1994 Novell, Inc.               Page 4








Typewritten Software • bear@typewritten.org • Edmonds, WA 98026