Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ lex(1) — CLIX 3.1r7.6.28

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

yacc(1)



  lex(1)                              CLIX                              lex(1)



  NAME

    lex - Generates programs for simple lexical tasks

  SYNOPSIS

    lex [-rctvn] [file ... ]

  FLAGS

    -r   Outputs a RATFOR program.

    -c   Outputs a C program.  This is the default.

    -t   Causes the output program to go to stdout.

    -v   Displays a summary of statistics.

    -n   Turns off the -v flag.

  DESCRIPTION

    The lex command generates programs to be used in simple lexical analysis
    of text.  The input files (stdin is the default) contain strings and
    expressions to be searched for, and C text to be executed when strings are
    found.

    The file lex.yy.c is generated which, when loaded with the library, copies
    the input to the output except when a string specified in the file is
    found; then the corresponding program text is executed.  The actual string
    matched is left in yytext, an external character array.  Matching is done
    in order of the strings in the file.  The strings may contain square
    brackets to indicate character classes, as in [abx-z] to indicate a, b, x,
    y, and z; and the operators *, +, and ? mean respectively any non-negative
    number of, any positive number of, and either zero or one occurrence of,
    the previous character or character class.  The character . is the class
    of all ASCII characters except newline.  Parentheses for grouping and
    vertical bar for alternation are also supported.  The notation r{d,e} in a
    rule indicates between d and e instances of regular expression r.  It has
    higher precedence than |, but lower than *, ?, +, and concatenation.  Thus
    [a-zA-Z]+ matches a string of letters.  The character ^ at the beginning
    of an expression permits a successful match only immediately after a
    newline, and the character $ at the end of an expression requires a
    trailing newline.  The character / in an expression indicates trailing
    context; only the part of the expression up to the slash is returned in
    yytext, but the remainder of the expression must follow in the input
    stream.  An operator character may be used as an ordinary symbol if it is
    within " symbols or preceded by \.

    Three subroutines defined as macros are expected: input() to read a
    character; unput(c) to replace a character read; and output(c) to place an



  2/94 - Intergraph Corporation                                              1






  lex(1)                              CLIX                              lex(1)



    output character.  They are defined in terms of the standard streams, they
    can be overridden.  The program generated is named yylex(), and the
    library contains a main() function which calls it.  The action REJECT on
    the right side of the rule causes this match to be rejected and the next
    suitable match executed; the function yymore() accumulates additional
    characters into the same yytext; and the yyless(p) function pushes back
    the portion of the string matched beginning at p, which should be between
    yytext and yytext + yyleng.  The macros input and output use files yyin
    and yyout to read from and write to, defaulted to stdin and stdout,
    respectively.

    Any line beginning with a blank is assumed to contain only C text and is
    copied; if it precedes %% it is copied into the external definition area
    of the lex.yy.c file.  All rules should follow a %%, as in yacc.  Lines
    preceding %% which begin with a nonblank character define the string on
    the left to be the remainder of the line; it can be called out later by
    surrounding it with {}.  Note that curly brackets do not imply
    parentheses; only string substitution is done.

  EXAMPLE

    This example generates the yylex() function, which when called will
    recognize tokens from a subset of the C programming language.

    D       [0-9]
    %%
    if      printf("IF statement\n");
    [a-z]+  printf("tag, value %s\n",yytext);
    0{D}+   printf("octal number %s\n",yytext);
    {D}+    printf("decimal number %s\n",yytext);
    "++"    printf("unary op\n");
    "+"     printf("binary op\n");
    "/*"    skipcommnts();
    %%
    skipcommnts()
    {
         for (;;) {
              while (input() != '*')
                   ;
              if (input() != '/')
                   unput(yytext[yyleng-1]);
              else
                   return;
         }
    }

    The external names generated by lex all begin with the prefix yy or YY.

    Certain table sizes for the resulting finite state machine can be set in
    the definitions section:




  2                                              Intergraph Corporation - 2/94






  lex(1)                              CLIX                              lex(1)



    %p n   The number of positions is n.  The default is 2500.

    %n n   The number of states is n.  The default is 500.

    %e n   The number of parse tree nodes is n.  The default is 1000.

    %a n   The number of transitions is n.  The default is 2000.

    %k n   The number of packed character classes is n.  The default is 1000.

    %o n   The size of output array is n.  The default is 3000.

    The use of one or more of the above automatically implies the -v flag,
    unless the -n flag is used.

  NOTES

    The -r flag is not yet fully operational.

    The lex command does not support 8-bit fonts or international characters.

  RELATED INFORMATION

    Commands:  yacc(1)






























  2/94 - Intergraph Corporation                                              3




Typewritten Software • bear@typewritten.org • Edmonds, WA 98026