Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ AWK Language Programming - 17. The Evolution of the awk Language

Media Vault

Software Library

Restoration Projects

Artifacts Sought

AWK Language Programming - 17. The Evolution of the awk Language

Go to the first, previous, next, last section, table of contents.


17. The Evolution of the awk Language

This book describes the GNU implementation of awk, which follows the POSIX specification. Many awk users are only familiar with the original awk implementation in Version 7 Unix. (This implementation was the basis for awk in Berkeley Unix, through 4.3--Reno. The 4.4 release of Berkeley Unix uses gawk 2.15.2 for its version of awk.) This chapter briefly describes the evolution of the awk language, with cross references to other parts of the book where you can find more information.

17.1 Major Changes between V7 and SVR3.1

The awk language evolved considerably between the release of Version 7 Unix (1978) and the new version first made generally available in System V Release 3.1 (1987). This section summarizes the changes, with cross-references to further details.

  • The requirement for `;' to separate rules on a line (see section 2.6 awk Statements Versus Lines).
  • User-defined functions, and the return statement (see section 13. User-defined Functions).
  • The delete statement (see section 11.6 The delete Statement).
  • The do-while statement (see section 9.3 The do-while Statement).
  • The built-in functions atan2, cos, sin, rand and srand (see section 12.2 Numeric Built-in Functions).
  • The built-in functions gsub, sub, and match (see section 12.3 Built-in Functions for String Manipulation).
  • The built-in functions close, and system (see section 12.4 Built-in Functions for Input/Output).
  • The ARGC, ARGV, FNR, RLENGTH, RSTART, and SUBSEP built-in variables (see section 10. Built-in Variables).
  • The conditional expression using the ternary operator `?:' (see section 7.12 Conditional Expressions).
  • The exponentiation operator `^' (see section 7.5 Arithmetic Operators) and its assignment operator form `^=' (see section 7.7 Assignment Expressions).
  • C-compatible operator precedence, which breaks some old awk programs (see section 7.14 Operator Precedence (How Operators Nest)).
  • Regexps as the value of FS (see section 5.5 Specifying How Fields are Separated), and as the third argument to the split function (see section 12.3 Built-in Functions for String Manipulation).
  • Dynamic regexps as operands of the `~' and `!~' operators (see section 4.1 How to Use Regular Expressions).
  • The escape sequences `\b', `\f', and `\r' (see section 4.2 Escape Sequences). (Some vendors have updated their old versions of awk to recognize `\r', `\b', and `\f', but this is not something you can rely on.)
  • Redirection of input for the getline function (see section 5.8 Explicit Input with getline).
  • Multiple BEGIN and END rules (see section 8.1.5 The BEGIN and END Special Patterns).
  • Multi-dimensional arrays (see section 11.9 Multi-dimensional Arrays).

17.2 Changes between SVR3.1 and SVR4

The System V Release 4 version of Unix awk added these features (some of which originated in gawk):

  • The ENVIRON variable (see section 10. Built-in Variables).
  • Multiple `-f' options on the command line (see section 14.1 Command Line Options).
  • The `-v' option for assigning variables before program execution begins (see section 14.1 Command Line Options).
  • The `--' option for terminating command line options.
  • The `\a', `\v', and `\x' escape sequences (see section 4.2 Escape Sequences).
  • A defined return value for the srand built-in function (see section 12.2 Numeric Built-in Functions).
  • The toupper and tolower built-in string functions for case translation (see section 12.3 Built-in Functions for String Manipulation).
  • A cleaner specification for the `%c' format-control letter in the printf function (see section 6.5.2 Format-Control Letters).
  • The ability to dynamically pass the field width and precision ("%*.*d") in the argument list of the printf function (see section 6.5.2 Format-Control Letters).
  • The use of regexp constants such as /foo/ as expressions, where they are equivalent to using the matching operator, as in `$0 ~ /foo/' (see section 7.2 Using Regular Expression Constants).

17.3 Changes between SVR4 and POSIX awk

The POSIX Command Language and Utilities standard for awk introduced the following changes into the language:

  • The use of `-W' for implementation-specific options.
  • The use of CONVFMT for controlling the conversion of numbers to strings (see section 7.4 Conversion of Strings and Numbers).
  • The concept of a numeric string, and tighter comparison rules to go with it (see section 7.10 Variable Typing and Comparison Expressions).
  • More complete documentation of many of the previously undocumented features of the language.

The following common extensions are not permitted by the POSIX standard:

  • \x escape sequences are not recognized (see section 4.2 Escape Sequences).
  • The synonym func for the keyword function is not recognized (see section 13.1 Function Definition Syntax).
  • The operators `**' and `**=' cannot be used in place of `^' and `^=' (see section 7.5 Arithmetic Operators, and also see section 7.7 Assignment Expressions).
  • Specifying `-Ft' on the command line does not set the value of FS to be a single tab character (see section 5.5 Specifying How Fields are Separated).
  • The fflush built-in function is not supported (see section 12.4 Built-in Functions for Input/Output).

17.4 Extensions in the AT&T Bell Laboratories awk

Brian Kernighan, one of the original designers of Unix awk, has made his version available via anonymous ftp (see section B.8 Other Freely Available awk Implementations). This section describes extensions in his version of awk that are not in POSIX awk.

  • The `-mf=NNN' and `-mr=NNN' command line options to set the maximum number of fields, and the maximum record size, respectively (see section 14.1 Command Line Options).
  • The fflush built-in function for flushing buffered output (see section 12.4 Built-in Functions for Input/Output).

17.5 Extensions in gawk Not in POSIX awk

The GNU implementation, gawk, adds a number of features. This sections lists them in the order they were added to gawk. They can all be disabled with either the `--traditional' or `--posix' options (see section 14.1 Command Line Options).

Version 2.10 of gawk introduced these features:

  • The AWKPATH environment variable for specifying a path search for the `-f' command line option (see section 14.1 Command Line Options).
  • The IGNORECASE variable and its effects (see section 4.5 Case-sensitivity in Matching).
  • The `/dev/stdin', `/dev/stdout', `/dev/stderr', and `/dev/fd/n' file name interpretation (see section 6.7 Special File Names in gawk).

Version 2.13 of gawk introduced these features:

  • The FIELDWIDTHS variable and its effects (see section 5.6 Reading Fixed-width Data).
  • The systime and strftime built-in functions for obtaining and printing time stamps (see section 12.5 Functions for Dealing with Time Stamps).
  • The `-W lint' option to provide source code and run time error and portability checking (see section 14.1 Command Line Options).
  • The `-W compat' option to turn off these extensions (see section 14.1 Command Line Options).
  • The `-W posix' option for full POSIX compliance (see section 14.1 Command Line Options).

Version 2.14 of gawk introduced these features:

  • The next file statement for skipping to the next data file (see section 9.8 The nextfile Statement).

Version 2.15 of gawk introduced these features:

  • The ARGIND variable, that tracks the movement of FILENAME through ARGV (see section 10. Built-in Variables).
  • The ERRNO variable, that contains the system error message when getline returns -1, or when close fails (see section 10. Built-in Variables).
  • The ability to use GNU-style long named options that start with `--' (see section 14.1 Command Line Options).
  • The `--source' option for mixing command line and library file source code (see section 14.1 Command Line Options).
  • The `/dev/pid', `/dev/ppid', `/dev/pgrpid', and `/dev/user' file name interpretation (see section 6.7 Special File Names in gawk).

Version 3.0 of gawk introduced these features:

  • The next file statement became nextfile (see section 9.8 The nextfile Statement).
  • The `--lint-old' option to warn about constructs that are not available in the original Version 7 Unix version of awk (see section 17.1 Major Changes between V7 and SVR3.1).
  • The `--traditional' option was added as a better name for `--compat' (see section 14.1 Command Line Options).
  • The ability for FS to be a null string, and for the third argument to split to be the null string (see section 5.5.3 Making Each Character a Separate Field).
  • The ability for RS to be a regexp (see section 5.1 How Input is Split into Records).
  • The RT variable (see section 5.1 How Input is Split into Records).
  • The gensub function for more powerful text manipulation (see section 12.3 Built-in Functions for String Manipulation).
  • The strftime function acquired a default time format, allowing it to be called with no arguments (see section 12.5 Functions for Dealing with Time Stamps).
  • Full support for both POSIX and GNU regexps (see section 4. Regular Expressions).
  • The `--re-interval' option to provide interval expressions in regexps (see section 4.3 Regular Expression Operators).
  • IGNORECASE changed, now applying to string comparison as well as regexp operations (see section 4.5 Case-sensitivity in Matching).
  • The `-m' option and the fflush function from the Bell Labs research version of awk (see section 14.1 Command Line Options; also see section 12.4 Built-in Functions for Input/Output).
  • The use of GNU Autoconf to control the configuration process (see section B.2.1 Compiling gawk for Unix).
  • Amiga support (see section B.6 Installing gawk on an Amiga).


Go to the first, previous, next, last section, table of contents.

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026