AWK(1)
NAME
awk − pattern scanning and processing language
USAGE
awk [ −Fc ] [ prog ] [ parameters ] [ files ]
DESCRIPTION
Awk scans input files for lines that match any of a set of patterns specified in prog. With each pattern in prog, it performs an associated action when a line matches the pattern. The set of patterns may appear literally as prog, or in a file specified as −f file. Enclose the prog string in single quotes (‘ ’) to protect it from the shell.
Awk reads files in the order in which they appear on the command line. If you do not specify any files or if you use a dash (−) in place of any filenames, it reads the standard input. Each line is matched against the pattern portion of every pattern-action statement; the associated action is performed for each matched pattern.
An input line is composed of fields separated by white space. This default can be changed by using the FS variable name (see below). The fields are denoted $1, $2, ...; $0 refers to the entire line.
A pattern-action statement has the following form:
pattern { action }
A missing action means print the line; a missing pattern always matches. An action is a sequence of statements. A statement is one of the following:
if ( conditional ) statement [ else statement ]
while ( conditional ) statement
for ( expression ; conditional ; expression ) statement
break
continue
{ [ statement ] ... }
variable = expression
print [ expression-list ] [ >expression ]
printf format [ , expression-list ] [ >expression ]
next# skip remaining patterns on this input line
exit# skip the rest of the input
Statements are terminated by semicolons, newlines, or right braces. An empty expression-list stands for the whole line. Expressions take on string or numeric values as appropriate, and are built using the operators +, −, ∗, /, %, and concatenation (indicated by a blank). The following C operators are also valid in expressions: ++, −−, +=, −=, ∗=, /=, and %=. Variables may be scalars, array elements (denoted by x[i]), or fields. Variables are initialized to the null string. Array subscripts may be any string, not necessarily numeric; this allows for a form of associative memory. String constants are placed in double quotes (" ").
The print statement prints its arguments on the standard output (or on a file if >expr is present), separated by the current output field separator, and terminated by the output record separator. The printf statement formats its expression list according to the printf(3S) format.
The built-in function called length returns the length of its argument taken as a string, or of the whole line if no argument exists. There are also built-in functions known as exp, log, sqrt, and int. The last function truncates its argument to an integer; substr(s,‘m, ‘n) returns the n-character substring of s that begins at position m. The function sprintf(fmt,‘expr, ‘expr,‘...) formats the expressions according to the printf(3S) format given by fmt and returns the resulting string.
Patterns are arbitrary Boolean combinations ( !, ││, &&, and parentheses) of regular and relational expressions. A pattern may consist of two patterns separated by a comma; in this case, the action is performed for all lines between an occurrence of the first pattern and the next occurrence of the second.
Regular expressions must be surrounded by slashes and appear as in egrep, which appears under grep(1). Isolated regular expressions in a pattern apply to the entire line. Regular expressions may also occur in relational expressions.
A relational expression is one of the following:
expression matchop regular-expression
expression relop expression
where a relop is any of the six relational operators in C, and a matchop is either a tilde (~) for contains or an exclamation point and a tilde (!~) for
"does not contain". A conditional is an arithmetic expression, a relational expression, or a Boolean combination of these.
The special patterns BEGIN and END may be used to capture control before the first input line is read and after the last. BEGIN must be the first pattern, END must be the last.
A single character c may be used to separate the fields by starting the program with:
BEGIN { FS = c }
or by using the −Fc option.
Other variable names with special meanings include NF, the number of fields in the current record; NR, the ordinal number of the current record; FILENAME, the name of the current input file; OFS, the output field separator (default blank); ORS, the output record separator (default newline); and OFMT, the output format for numbers (default %.6g ).
EXAMPLES
To print lines longer than 72 characters:
length > 72
To print the first two fields in opposite order:
{ print $2, $1 }
To add the first column, and then print the sum and average:
{ s += $1 }
END{ print "sum is", s, " average is", s/NR }
To print fields in reverse order:
{ for (i = NF; i > 0; −−i) print $i }
To print all lines between start/stop pairs:
/start/, /stop/
To print all lines whose first field is different from the previous one:
$1 != prev { print; prev = $1 }
To print the file, filling in page numbers starting at 5:
/Page/ { $2 = n++; }
{ print }
command line: awk −f program n=5 input
CAUTIONS
No explicit conversions exist between numbers and strings. To force an expression to be treated as a number, add zero to it. To force it to be treated as a string, concatenate the null string (" ") to it.