Flex - a scanner generator - 1.19 Incompatibilities with lex and POSIX
1.19 Incompatibilities with lex and POSIX
flex is a rewrite of the AT&T Unix lex tool (the two
implementations do not share any code, though), with some
extensions and incompatibilities, both of which are of
concern to those who wish to write scanners acceptable to
either implementation. Flex is fully compliant with the
POSIX lex specification, except that when using `%pointer'
(the default), a call to `unput()' destroys the contents of
yytext, which is counter to the POSIX specification.
In this section we discuss all of the known areas of incompatibility between flex, AT&T lex, and the POSIX specification.
flex's `-l' option turns on maximum compatibility with the
original AT&T lex implementation, at the cost of a major
loss in the generated scanner's performance. We note
below which incompatibilities can be overcome using the `-l'
option.
flex is fully compatible with lex with the following
exceptions:
-
The undocumented
lexscanner internal variableyylinenois not supported unless `-l' or `%option yylineno' is used.yylinenoshould be maintained on a per-buffer basis, rather than a per-scanner (single global variable) basis.yylinenois not part of the POSIX specification. -
The `input()' routine is not redefinable, though it
may be called to read characters following whatever
has been matched by a rule. If `input()' encounters
an end-of-file the normal `yywrap()' processing is
done. A "real" end-of-file is returned by
`input()' as
EOF. Input is instead controlled by defining theYY_INPUTmacro. Theflexrestriction that `input()' cannot be redefined is in accordance with the POSIX specification, which simply does not specify any way of controlling the scanner's input other than by making an initial assignment toyyin. - The `unput()' routine is not redefinable. This restriction is in accordance with POSIX.
-
flexscanners are not as reentrant aslexscanners. In particular, if you have an interactive scanner and an interrupt handler which long-jumps out of the scanner, and the scanner is subsequently called again, you may get the following message:fatal flex scanner internal error--end of buffer missed
To reenter the scanner, first useyyrestart( yyin );
Note that this call will throw away any buffered input; usually this isn't a problem with an interactive scanner. Also note that flex C++ scanner classes are reentrant, so if using C++ is an option for you, you should use them instead. See "Generating C++ Scanners" above for details. -
`output()' is not supported. Output from the `ECHO'
macro is done to the file-pointer
yyout(defaultstdout). `output()' is not part of the POSIX specification. -
lexdoes not support exclusive start conditions (%x), though they are in the POSIX specification. -
When definitions are expanded,
flexencloses them in parentheses. With lex, the following:NAME [A-Z][A-Z0-9]* %% foo{NAME}? printf( "Found it\n" ); %%will not match the string "foo" because when the macro is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?" and the precedence is such that the '?' is associated with "[A-Z0-9]*". Withflex, the rule will be expanded to "foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match. Note that if the definition begins with `^' or ends with `$' then it is not expanded with parentheses, to allow these operators to appear in definitions without losing their special meanings. But the `<s>, /', and `<<EOF>>' operators cannot be used in aflexdefinition. Using `-l' results in thelexbehavior of no parentheses around the definition. The POSIX specification is that the definition be enclosed in parentheses. -
Some implementations of
lexallow a rule's action to begin on a separate line, if the rule's pattern has trailing whitespace:%% foo|bar<space here> { foobar_action(); }flexdoes not support this feature. -
The
lex`%r' (generate a Ratfor scanner) option is not supported. It is not part of the POSIX specification. -
After a call to `unput()',
yytextis undefined until the next token is matched, unless the scanner was built using `%array'. This is not the case withlexor the POSIX specification. The `-l' option does away with this incompatibility. -
The precedence of the `{}' (numeric range) operator
is different.
lexinterprets "abc{1,3}" as "match one, two, or three occurrences of 'abc'", whereasflexinterprets it as "match 'ab' followed by one, two, or three occurrences of 'c'". The latter is in agreement with the POSIX specification. -
The precedence of the `^' operator is different.
lexinterprets "^foo|bar" as "match either 'foo' at the beginning of a line, or 'bar' anywhere", whereasflexinterprets it as "match either 'foo' or 'bar' if they come at the beginning of a line". The latter is in agreement with the POSIX specification. -
The special table-size declarations such as `%a'
supported by
lexare not required byflexscanners;flexignores them. -
The name FLEX_SCANNER is #define'd so scanners may
be written for use with either
flexorlex. Scanners also includeYY_FLEX_MAJOR_VERSIONandYY_FLEX_MINOR_VERSIONindicating which version offlexgenerated the scanner (for example, for the 2.5 release, these defines would be 2 and 5 respectively).
The following flex features are not included in lex or the
POSIX specification:
C++ scanners
%option
start condition scopes
start condition stacks
interactive/non-interactive scanners
yy_scan_string() and friends
yyterminate()
yy_set_interactive()
yy_set_bol()
YY_AT_BOL()
<<EOF>>
<*>
YY_DECL
YY_START
YY_USER_ACTION
YY_USER_INIT
#line directives
%{}'s around actions
multiple actions on a line
plus almost all of the flex flags. The last feature in
the list refers to the fact that with flex you can put
multiple actions on the same line, separated with
semicolons, while with lex, the following
foo handle_foo(); ++num_foos_seen;
is (rather surprisingly) truncated to
foo handle_foo();
flex does not truncate the action. Actions that are not
enclosed in braces are simply terminated at the end of the
line.
Go to the first, previous, next, last section, table of contents.