awk
PURPOSE
Finds lines in files matching specified patterns and per-
forms specified actions on them.
SYNOPSIS
awk [ -Fc ] [ prog|-f progfile ] [ parameters ] [ files ]
DESCRIPTION
The awk command is a more powerful pattern matching
command than the grep command. It can perform limited
processing on the input lines, instead of simply dis-
playing lines that match. Some of the features of awk
are:
o It can perform convenient numeric processing.
o It allows variables within actions.
o It allows general selection of patterns.
o It allows control flow in the actions.
o It does not require any compiling of programs.
For a detailed discussion of awk, see AIX Operating
System Programming Tools and Interfaces.
The awk command, reads files in the order stated on the
command line. If you specify a file name as - (minus) or
do not specify a file name, awk reads standard input.
The awk command searches its input line by line for
patterns. When it finds a match, it performs the associ-
ated action and writes the result to standard output.
Enclose pattern-action statements on the command line in
single quotation marks to protect them from interpreta-
tion by the shell.
The awk command first reads all pattern-action state-
ments, then it reads a line of input and compares it to
each pattern, performing the associated actions on each
match. When it has compared all patterns to the input
line, it reads the next line.
The awk command treats input lines as fields separated by
spaces, tabs, or a field separator you set with the FS
variable. Fields are referenced as $1, $2, and so on.
$0 refers to the entire line.
On the awk command line, you can assign values to vari-
ables as follows:
variable=value
Pattern-Matching Statements
Pattern-matching statements follow the form:
pattern { action }
If a pattern lacks a corresponding action, awk writes the
entire line that contains the pattern to standard output.
If an action lacks a corresponding pattern, it matches
every line.
ACTIONS: An action is a sequence of statements that
follow C Language syntax. These statements can include:
statement format
if if ( conditional ) statement [ else statement
]
while while ( conditional ) statement
for for ( expression ; conditional ; expression )
statement
break
continue
{ statement . . . }
(assignment) variable=expression
print print [expression-list] [>expression]
printf printf format[, expression-list]
[>expression]
next
exit
Statements can end with a semicolon, a new-line character
, or the right brace enclosing the action.
If you do not supply an action, awk displays the whole
line. Expressions can have string or numeric values and
are built using the operators "+", "-", "*", "/", "%", a
blank for string concatenation, and the C operators "++",
"--", "+=", "-=", "*=", "/=", and "%=".
In statements, variables may be scalars, array elements
(denoted x[i]) or fields. Variable names may consist of
upper- and lowercase alphabetic letters, the underscore
character, the digits (0-9), and extended characters.
Variable names cannot begin with a digit. Variables are
initialized to the null string. Array subscripts may be
any string; they do not have to be numeric. This allows
for a form of associative memory. String constants in
expressions should be enclosed in double quotation marks.
There are several variables with special meaning to awk.
They include:
FS Input field separator (default is a blank).
This separator character cannot be a two-byte
extended character.
NF The number of fields in the current input line
(record).
NR The number of the current input line (record).
FILENAME The name of the current input file.
OFS The output field separator (default is a
blank). This separator character cannot be a
two-byte extended character.
ORS The output record separator (default is a new-
line character). This separator character
cannot be a two-byte extended character.
OFMT The output format for numbers (default "%.6g").
Since the actions process fields, input white space is
not preserved on the output.
The printf statement formats its expression list
according to the format of the printf subroutine (see AIX
Operating System Technical Reference), and writes it
arguments to standard output, separated by the output
field separator and terminated by the output record sepa-
rator. You can redirect the output using the print> file
or printf> file statements.
You have two ways to designate a character other than
white space to separate fields. You can use the -Fc flag
on the awk command line, or you can start progfile with:
BEGIN { FS = c }
Either action changes the field separator to c.
There are several built-in functions that can be used in
awk actions.
length Returns the length of the
whole line if there is no
argument or the length of
its argument taken as a
string.
exp(n) Takes the exponential of
its argument.
log(n) Takes the base e logarithm
of its argument.
sqrt(n) Takes the square root of
its argument.
int(n) Takes the integer part of
its argument.
substr(s,m,n) Returns the substring n
characters long of s,
beginning at position m.
sprintf(fmt,expr,expr, . . . ) Formats the expressions
according to the printf
format string fmt and
returns the resulting
string.
PATTERNS: Patterns are arbitrary Boolean combinations of
patterns and relational expressions (the "!", ||, and
"&&" operators and parentheses for grouping). You must
start and end patterns with slashes (/). You can use
regular expressions like those allowed by the egrep
command (see "grep"), including the following special
characters:
+ One or more occurrences of the pattern.
? Zero or one occurrences of the pattern.
| Either of two statements.
( ) Grouping of expressions.
Isolated patterns in a pattern apply to the entire line.
Patterns can occur in relational expressions. If two
patterns are separated by a comma, the action is per-
formed on all lines between an occurrence of the first
pattern and the next occurrence of the second. Regular
expressions can contain extended characters with one
exception: range constructs in character class specifi-
cations using square brackets cannot contain two-byte
extended characters. Individual instances of extended
characters can appear within square brackets; however,
two-byte extended characters are treated as two separate
one-byte characters. Regular expressions can also occur
in relational expressions.
There are two types of relational expressions that you
can use. One has the form:
expression matchop pattern
where matchop is either: ~ (for "contains") or !~ (for
"does not contain"). The second has the form:
expression relop expression
where relop is any of the six C relational operators:
"<", ">", "<=", ">=", "==", and "!=". A conditional can
be an arithmetic expression, a relational expression, or
a Boolean combination of these.
You can use the special patterns BEGIN and END to capture
control before the first and after the last input line is
read, respectively. You can only use these patterns
before the first and after the last line in progfile.
There are no explicit conversions between numbers and
strings. To force an expression to be treated as a
number, add "0" to it. To force it to be treated as a
string, append a null string ("""").
FLAGS
-f progfile Searches for the patterns and perform the
actions found in the file progfile.
-Fchar Uses char as the field separator character
(by default a blank).
EXAMPLES
1. To display the lines of a file that are longer than
72 characters:
awk "length >72" chapter1
This selects each line of the file "chapter1" that is
longer than "72" characters. awk then writes these
lines to standard output because no action is speci-
fied.
2. To display all lines between the words "start" and
"stop":
awk "/start/,/stop/" chapter1
3. To run an awk program ("sum2.awk" .) that processes a
file ("chapter1"):
awk -f sum2.awk chapter1
The following awk program computes the sum and
average of the numbers in the second column of the
input file:
{
sum += $2
}
END {
print "Sum: ", sum;
print "Average:", sum/NR;
}
The first action adds the value of the second field
of each line to the variable "sum". awk initializes
"sum" (and all variables) to zero before starting.
The keyword END before the second action causes awk
to perform that action after all of the input file
has been read. The variable NR, which is used to
calculate the average, is a special variable con-
taining the number of records (lines) that have been
read.
4. To print the names of the users who have the C shell
as the initial shell:
awk -F: '/csh/{print $1}' /etc/passwd
RELATED INFORMATION
The following commands: "lex," "grep," and "sed."
The printf subroutine in AIX Operating System Technical
Reference.
The "Overview of International Character Support" in Man-
aging the AIX Operating System.
The discussion of awk in AIX Operating System Programming
Tools and Interfaces.