AWK Language Programming - 17. The Evolution of the awk Language
17. The Evolution of the awk Language
This book describes the GNU implementation of awk, which follows
the POSIX specification. Many awk users are only familiar
with the original awk implementation in Version 7 Unix.
(This implementation was the basis for awk in Berkeley Unix,
through 4.3--Reno. The 4.4 release of Berkeley Unix uses gawk 2.15.2
for its version of awk.) This chapter briefly describes the
evolution of the awk language, with cross references to other parts
of the book where you can find more information.
17.1 Major Changes between V7 and SVR3.1
The awk language evolved considerably between the release of
Version 7 Unix (1978) and the new version first made generally available in
System V Release 3.1 (1987). This section summarizes the changes, with
cross-references to further details.
-
The requirement for `;' to separate rules on a line
(see section 2.6
awkStatements Versus Lines). -
User-defined functions, and the
returnstatement (see section 13. User-defined Functions). -
The
deletestatement (see section 11.6 ThedeleteStatement). -
The
do-whilestatement (see section 9.3 Thedo-whileStatement). -
The built-in functions
atan2,cos,sin,randandsrand(see section 12.2 Numeric Built-in Functions). -
The built-in functions
gsub,sub, andmatch(see section 12.3 Built-in Functions for String Manipulation). -
The built-in functions
close, andsystem(see section 12.4 Built-in Functions for Input/Output). -
The
ARGC,ARGV,FNR,RLENGTH,RSTART, andSUBSEPbuilt-in variables (see section 10. Built-in Variables). - The conditional expression using the ternary operator `?:' (see section 7.12 Conditional Expressions).
- The exponentiation operator `^' (see section 7.5 Arithmetic Operators) and its assignment operator form `^=' (see section 7.7 Assignment Expressions).
-
C-compatible operator precedence, which breaks some old
awkprograms (see section 7.14 Operator Precedence (How Operators Nest)). -
Regexps as the value of
FS(see section 5.5 Specifying How Fields are Separated), and as the third argument to thesplitfunction (see section 12.3 Built-in Functions for String Manipulation). - Dynamic regexps as operands of the `~' and `!~' operators (see section 4.1 How to Use Regular Expressions).
-
The escape sequences `\b', `\f', and `\r'
(see section 4.2 Escape Sequences).
(Some vendors have updated their old versions of
awkto recognize `\r', `\b', and `\f', but this is not something you can rely on.) -
Redirection of input for the
getlinefunction (see section 5.8 Explicit Input withgetline). -
Multiple
BEGINandENDrules (see section 8.1.5 TheBEGINandENDSpecial Patterns). - Multi-dimensional arrays (see section 11.9 Multi-dimensional Arrays).
17.2 Changes between SVR3.1 and SVR4
The System V Release 4 version of Unix awk added these features
(some of which originated in gawk):
-
The
ENVIRONvariable (see section 10. Built-in Variables). - Multiple `-f' options on the command line (see section 14.1 Command Line Options).
- The `-v' option for assigning variables before program execution begins (see section 14.1 Command Line Options).
- The `--' option for terminating command line options.
- The `\a', `\v', and `\x' escape sequences (see section 4.2 Escape Sequences).
-
A defined return value for the
srandbuilt-in function (see section 12.2 Numeric Built-in Functions). -
The
toupperandtolowerbuilt-in string functions for case translation (see section 12.3 Built-in Functions for String Manipulation). -
A cleaner specification for the `%c' format-control letter in the
printffunction (see section 6.5.2 Format-Control Letters). -
The ability to dynamically pass the field width and precision (
"%*.*d") in the argument list of theprintffunction (see section 6.5.2 Format-Control Letters). -
The use of regexp constants such as
/foo/as expressions, where they are equivalent to using the matching operator, as in `$0 ~ /foo/' (see section 7.2 Using Regular Expression Constants).
17.3 Changes between SVR4 and POSIX awk
The POSIX Command Language and Utilities standard for awk
introduced the following changes into the language:
- The use of `-W' for implementation-specific options.
-
The use of
CONVFMTfor controlling the conversion of numbers to strings (see section 7.4 Conversion of Strings and Numbers). - The concept of a numeric string, and tighter comparison rules to go with it (see section 7.10 Variable Typing and Comparison Expressions).
- More complete documentation of many of the previously undocumented features of the language.
The following common extensions are not permitted by the POSIX standard:
-
\xescape sequences are not recognized (see section 4.2 Escape Sequences). -
The synonym
funcfor the keywordfunctionis not recognized (see section 13.1 Function Definition Syntax). - The operators `**' and `**=' cannot be used in place of `^' and `^=' (see section 7.5 Arithmetic Operators, and also see section 7.7 Assignment Expressions).
-
Specifying `-Ft' on the command line does not set the value
of
FSto be a single tab character (see section 5.5 Specifying How Fields are Separated). -
The
fflushbuilt-in function is not supported (see section 12.4 Built-in Functions for Input/Output).
17.4 Extensions in the AT&T Bell Laboratories awk
Brian Kernighan, one of the original designers of Unix awk,
has made his version available via anonymous ftp
(see section B.8 Other Freely Available awk Implementations).
This section describes extensions in his version of awk that are
not in POSIX awk.
- The `-mf=NNN' and `-mr=NNN' command line options to set the maximum number of fields, and the maximum record size, respectively (see section 14.1 Command Line Options).
-
The
fflushbuilt-in function for flushing buffered output (see section 12.4 Built-in Functions for Input/Output).
17.5 Extensions in gawk Not in POSIX awk
The GNU implementation, gawk, adds a number of features.
This sections lists them in the order they were added to gawk.
They can all be disabled with either the `--traditional' or
`--posix' options
(see section 14.1 Command Line Options).
Version 2.10 of gawk introduced these features:
-
The
AWKPATHenvironment variable for specifying a path search for the `-f' command line option (see section 14.1 Command Line Options). -
The
IGNORECASEvariable and its effects (see section 4.5 Case-sensitivity in Matching). -
The `/dev/stdin', `/dev/stdout', `/dev/stderr', and
`/dev/fd/n' file name interpretation
(see section 6.7 Special File Names in
gawk).
Version 2.13 of gawk introduced these features:
-
The
FIELDWIDTHSvariable and its effects (see section 5.6 Reading Fixed-width Data). -
The
systimeandstrftimebuilt-in functions for obtaining and printing time stamps (see section 12.5 Functions for Dealing with Time Stamps). - The `-W lint' option to provide source code and run time error and portability checking (see section 14.1 Command Line Options).
- The `-W compat' option to turn off these extensions (see section 14.1 Command Line Options).
- The `-W posix' option for full POSIX compliance (see section 14.1 Command Line Options).
Version 2.14 of gawk introduced these features:
-
The
next filestatement for skipping to the next data file (see section 9.8 ThenextfileStatement).
Version 2.15 of gawk introduced these features:
-
The
ARGINDvariable, that tracks the movement ofFILENAMEthroughARGV(see section 10. Built-in Variables). -
The
ERRNOvariable, that contains the system error message whengetlinereturns -1, or whenclosefails (see section 10. Built-in Variables). - The ability to use GNU-style long named options that start with `--' (see section 14.1 Command Line Options).
- The `--source' option for mixing command line and library file source code (see section 14.1 Command Line Options).
-
The `/dev/pid', `/dev/ppid', `/dev/pgrpid', and
`/dev/user' file name interpretation
(see section 6.7 Special File Names in
gawk).
Version 3.0 of gawk introduced these features:
-
The
next filestatement becamenextfile(see section 9.8 ThenextfileStatement). -
The `--lint-old' option to
warn about constructs that are not available in
the original Version 7 Unix version of
awk(see section 17.1 Major Changes between V7 and SVR3.1). - The `--traditional' option was added as a better name for `--compat' (see section 14.1 Command Line Options).
-
The ability for
FSto be a null string, and for the third argument tosplitto be the null string (see section 5.5.3 Making Each Character a Separate Field). -
The ability for
RSto be a regexp (see section 5.1 How Input is Split into Records). -
The
RTvariable (see section 5.1 How Input is Split into Records). -
The
gensubfunction for more powerful text manipulation (see section 12.3 Built-in Functions for String Manipulation). -
The
strftimefunction acquired a default time format, allowing it to be called with no arguments (see section 12.5 Functions for Dealing with Time Stamps). - Full support for both POSIX and GNU regexps (see section 4. Regular Expressions).
- The `--re-interval' option to provide interval expressions in regexps (see section 4.3 Regular Expression Operators).
-
IGNORECASEchanged, now applying to string comparison as well as regexp operations (see section 4.5 Case-sensitivity in Matching). -
The `-m' option and the
fflushfunction from the Bell Labs research version ofawk(see section 14.1 Command Line Options; also see section 12.4 Built-in Functions for Input/Output). -
The use of GNU Autoconf to control the configuration process
(see section B.2.1 Compiling
gawkfor Unix). -
Amiga support
(see section B.6 Installing
gawkon an Amiga).
Go to the first, previous, next, last section, table of contents.