awk(1) awk(1)
NAME
awk - scans a file for lines that match a specific pattern
SYNOPSIS
awk [-Ffield-separator] 'pattern-action...' [[-v]
variable=value]... [file]...
awk [-f awk-source-file] [-Ffield-separator] [[-v]
variable=value]... [file]...
ARGUMENTS
-f awk-source-file
Specifies the file containing the instruction that awk
should interpret.
-Ffield-separator
Specifies the character to be treated as the field
separator when awk parses a record into fields.
file Specifies the file or files containing text data to be
processed by awk.
pattern-action
Specifies an awk instruction, which is provided in the
form of a pattern followed by an action enclosed in
braces:
pattern {action}
[-v] variable=value
Specifies the value of an awk variable that is
established for use in the main sections of an awk
program, which consists of any number of pattern-action
arguments. If the -v option is present, the variable
is also available in the BEGIN (initialization) section
of an awk program.
DESCRIPTION
awk effectively handles most programs containing text-
parsing, report generation, and record validation tasks.
These programs typically contain a brief list of
instructions that specify text-scanning and text-
manipulation functions.
The standard operation of awk is to scan each input file
once, looking for matches between each input record and any
of a set of patterns that you supply. These pattern
instructions are accompanied by action instructions.
Sometimes the action instructions merely establish settings
that affect text processing that is undertaken by awk as
part of its standard operation, such as the parsing of
January 1992 1
awk(1) awk(1)
records into fields.
So that text patterns can be sought in specific positions in
an input record, awk splits the input record into fields at
every occurrence of a field-separator character. After an
input record is split into fields, each field is assigned to
a field variable, such as $1, $2, $3, and so forth. These
variables can be used to reference input fields either in
the pattern or the action portion of a pattern-action
argument.
You can obtain a measure of control over the field-parsing
function by specifying your own field separator for parsing
purposes. The default field separator is white space (tabs
or spaces). You can change this separator by making a
different assignment to the variable FS, or through the
command line by specifying a field-separator character along
with the -F option. To ensure that your own field separator
takes effect before any input records are parsed into
fields, use the -F construct or place the assignment in an
action associated with the BEGIN pattern. (See the example
at the end of ``Patterns,'' later in the ``Description''
section.)
(A regular expression can also be assigned to the FS
variable, in which case the field delimiter can be any one
of the possible values that match the regular expression.)
Although it looks like a field reference, $0 refers to the
entire input record, with field delimiters unstripped.
For the purposes of documenting syntax, a pattern and its
associated actions are considered one pattern-action. As
shown in the first syntax description in the ``Synopsis''
section, pattern-action arguments can be supplied directly
on the command line. Alternately, you can specify the -f
option so that pattern-action arguments can be placed inside
of an awk program file, as shown in the second syntax
description (see SYNOPSIS). In the latter case, replace
awk-source-file with the name of the program file with the
awk instructions you want to use.
Any time an input record contains a substring that is sought
as specified by pattern, awk performs the associated action.
The text of an input record that is matched by a pattern can
be accessed easily through references to the variables $0,
$1, $2, and so forth.
Input records can be acted upon immediately or handled less
directly. An example of an immediate action is the printing
of the contents of a matching input record as soon as it is
encountered. An example of a less immediate action is
2 January 1992
awk(1) awk(1)
storing a record in a variable when it is first encountered,
then printing it later if later conditions warrant it, such
as when the contents of subsequent records invalidate it and
an error message is desired.
A stored value persists until it is changed by another
portion of the same pattern-action or by an entirely
different pattern-action. Such assignments permit actions
to be gated not only by the text of the input record being
scanned but also through the stored text drawn from previous
input records.
Command-Line Options
Either pattern-action arguments are specified inside the awk
command lines as shown in the first syntax description line,
or they are supplied in a file through specification of file
arguments along with the -f option, as shown in the second
syntax description line. When pattern-action arguments all
appear in the command line, they should be formed into one
string enclosed in single quotation marks ('). The
quotation marks protect them from being interpreted by the
shell. Refer to the awk chapter in A/UX Programming
Languages and Tools, Volume 2 for more information about
shell and awk cooperation.
The level of escapement afforded by the single quotation
marks causes any references to shell variables to remain
unsubstituted by the shell. To enable their substitution
requires the use of awk variables that assign values inside
the command line.
Variables that are initialized on the command line provide a
means of passing parameter values between the shell and awk.
The most common use for passed parameters is to access the
values of positional variables available from within shell
scripts ($1, $2, and so forth). The format of these
assignments is similar to that of variable assignments,
except that an unescaped space cannot be used on either side
of the equal sign, as follows:
awk -f awkfile datafile variable1=x variable2=$1
If the parameter assignment is preceded by a -v option, the
value so assigned is made available even in the BEGIN
(initialization) section of the awk program. Otherwise, the
value is not assigned to the variable until after the BEGIN
section has been evaluated.
Like input files, the passed parameters are also evaluated
in the order in which they appear: Passed parameters that
are specified after an input file will not be available
January 1992 3
awk(1) awk(1)
while the system is processing that input file. Passed
parameters that are specified before any number of input
files will be available when processing those input files.
If no input file is specified, the standard input is read
until exhausted. When several input files are specified,
they are read in the order in which they are specified. If
the shorthand notation for standard input (-) is specified
as one of several file arguments, the standard input is also
read in the order in which it is specified.
Patterns
The pattern portion of a pattern-action argument often
involves the scanning of text for occurrences of a
particular text pattern. These patterns are specified
through a pattern-seeking template, better known as a
regular expression. For a more detailed explanation of
regular expressions, refer to ed(1).
Regular expressions must be surrounded by slashes. The
format for a regular expression is
/character-col1... character-colN/
where character-col1 through character-colN represent the
first through last characters to seek before a substring is
considered ``matched.''
Besides supplying a normal character to replace
character-col1 and other character positions, you can use a
special or wildcard character, such as the period, which
matches any character at that position. An asterisk matches
any number of any characters from that position onward.
Other special characters are the caret (^) and dollar sign
($), which ``match'' the beginning of a line and the end of
a line, respectively. The only sensible place to insert the
caret is at the beginning of pattern. Likewise, the only
sensible place to insert the dollar sign is at the very end
of pattern.
Besides supplying a single character to replace
character-col1 and other character positions, you can supply
a character range or a character list enclosed in brackets.
Thus,
/^[A-Z][aeiou]/
evaluates as true for all input records that start with an
uppercase character followed by a vowel.
The pattern portion of the pattern-action argument can be
any expression, including ones that do not involve pattern-
4 January 1992
awk(1) awk(1)
seeking. For example,
$1 > 0 { print }
is a valid pattern-action argument that prints all input
records with a first field that is greater than 0.
Pattern expressions often test for the presence of certain
text patterns, either within the entire input record or
within one or more fields in an input record. Field-scoped
searches require one of the ``pattern-seeking'' operators
and a regular expression, as follows:
$0 /Employee/ { action... }
~
$3 /Employee/ { action... }
~
If you search the entire input record for matching strings,
you do not have to supply the $0 portion of the line,
~
since this portion will be assumed when a regular expression
is supplied by itself as the pattern. This convention makes
the following patterns equivalent:
$0 /Employee/
~
/Employee/
To seek a contiguous set of input records starting from a
record that matches pattern1 and ending with a record that
matches pattern2, specify two regular expressions separated
by a comma, as follows:
/pattern1/,/ pattern2/ { action... }
The action is performed for all input records between an
occurrence of the first pattern and the next occurrence of
the second pattern.
The special patterns BEGIN and END can be used to establish
actions to be taken before the first input record is read
and after the input stream is exhausted. For example, a tab
can be made the field separator (exclusively) with
BEGIN { FS = "\t" }
Actions
A pattern-action argument has the form
pattern { action }
A missing {action} argument triggers the printing of
matching input records; a missing pattern argument causes
January 1992 5
awk(1) awk(1)
the associated action to be performed for every input record
(as if every input record matched the missing pattern). An
action argument is a sequence of statements. A statement
can be one of the following code fragments:
if ( conditional ) statement [ else statement ]
while ( conditional ) statement
for ( expression ; conditional ; expression ) statement
break
continue
{ [ statement ]... }
variable = expression
next
exit
Statements are terminated by semicolons, newline characters,
or right braces. Expressions take on string or numeric
values as appropriate and are built with the operators +, -,
*, /, %, and ``concatenation'' (indicated by a blank). The
C operators ++, --, +=, -=, *=, /=, and %= are also
available in expressions. Variables can be scalars, array
elements (denoted x[i]), or fields. Variables are
initialized to the null string. Array subscripts can be any
string, including strings generated automatically when
numeric expressions are used as subscripts. String
constants are enclosed in double quotation marks (").
The next and exit functions affect control flow. Use exit
to terminate processing without any further actions. Use
next to terminate any remaining actions that would have been
gated for the current input record, skipping to the
beginning of the current awk-source-file so that processing
can continue with the next input record.
Output Functions
The output functions include the following statements:
print [expression] [[,] expression]...
printf(format-string, expr [, expr]...)
Both of these statements can print to files as well as the
standard output, as described by the more general syntax
print-command [>file]
Use the print statement to print the results of expression
arguments followed by the output record separator character
given by the variable ORS. If print is specified without
any accompanying arguments, the entire input record is
printed. If several expressions are supplied, separated by
commas, the result of each expression is printed, separated
6 January 1992
awk(1) awk(1)
by the output field separator given by the variable OFS.
See ``Built-in Variables'' later in the ``Description''
section for more built-in variables.
Use the printf statement to format and print the result of
expr arguments in accordance with format-string (see
printf(3S)). Another way to place data on the awk output
stream is to use the system function
system(expression)
In this case, expression must compute to a valid shell
command so that it can be executed outside the context of
awk. Any output resulting from the execution of the command
is inserted into the output of awk. This function returns
the exit status for the command so that you can test for
successful execution by testing for a 0 exit value. (This
is the case for most, but not all, commands.)
Input Functions
Besides being supplied as command-line arguments, multiple
input files are supported through the getline function. This
record-reading function can be one of the actions associated
with a BEGIN or an END pattern, as well as any other
patterns. A typical use is to associate this action with the
BEGIN pattern to initialize the contents of an array from
static data stored in an external file. Since the return
value is 1 as long as the input file is not exhausted, you
can use the following code fragment to establish the file
table:
BEGIN {
while ( getline array[count] <"table" > 0 )
{ count = count + 1 }
}
.
.
.
This command can be specified in any of four different
forms:
getline
getline variable
getline <file
getline variable <file
The first form reads the next input record. Unlike the next
statement, with this form control remains at the place where
getline occurs within the current pattern-action argument
and proceeds to any pattern-action arguments that follow,
until the end-of-file character is reached.
January 1992 7
awk(1) awk(1)
The second form behaves in the same way except that certain
variables ($0, $1, and so forth) are not reset and the
content of the input record is assigned to variable
unstripped of field separators.
The third and fourth forms are the same as the first and
second forms except that the input record is read from file.
If file is an explicit reference to a file, enclose it in
quotation marks to make it a string constant. (Otherwise it
is likely to be interpreted as a variable that is
dynamically initialized to an empty string.) To switch
between many different input files, use the close(file)
function before opening any new files for reading.
Other String Functions
Here are the built-in functions for strings:
index(string1,string2)
Returns the index at which string2 first occurs inside
string1 or 0 if there is no match.
length(string)
Returns the length of its argument taken as a string,
or of the whole input record if no argument is
supplied.
match(string,pattern)
Returns the index at which the regular expression
pattern first occurs inside string while setting the
variables RSTART and RLENGTH. Returns 0 if there is no
match.
split(string,array,separator)
Splits string into fields that are assigned to elements
in array with subscripts 1, 2, and so on. A new field
is created at each occurrence of separator within
string. It returns the number of fields that were
parsed.
substr(string,position,length)
Returns the length-character substring of string that
begins at position position.
sprintf(format-string,expr[,expr]...)
Formats expressions in accordance with format-string
(described in printf(3S)), returning the resulting
string.
sub(pattern,replacement[,variable])
gsub(pattern,replacement[,variable])
Performs text substitution (search-and-replace)
functions either for the first matched substring (sub)
8 January 1992
awk(1) awk(1)
or globally for every matched substring (gsub).
Number Functions
Here are the built-in functions for numbers:
atan2(y,x)
Returns the arctangent of y/x in radians in the range
-π to π.
cos(radians)
Returns the cosine of the angle measure.
exp(power)
Returns e raised to the power power.
int(real)
Truncates real, returning an integer.
log(x)
Returns the natural logarithm of x.
rand()
Returns a pseudo-random number between 0 and 1.
srand([seed])
Sets the seed for the random number generator to seed
or to the time of day if seed is missing.
sin(radians)
Returns the cosine of the angle measure.
sqrt(x)
Returns the square root of x.
User-Defined Functions
User functions can be called just as built-in functions are,
once they are declared with
function name(arg...) { body }
Within body, the function return(expression) can be used to
cause the user function to return the value of the supplied
expression.
Expressions
This discussion of expressions applies within action
statements and within patterns. Only certain action
statements can include expressions; refer to ``Actions,''
earlier in the ``Description'' section for more information.
Parentheses can be used to establish operation precedence
for expressions containing several operators.
January 1992 9
awk(1) awk(1)
Expressions can be string or number constants, variables, or
field references as well as combinations of these joined by
equal (==), not equal (!=), greater-than (>), less-than (<),
greater-than-equal (>=), and less-than-equal (<=). Because
they produce Boolean results (true or false), two or more of
the preceding comparison operations can be related by means
of Boolean operators: logical AND (&&), logical OR (||), and
NOT (!).
To test for the existence of various substrings in a string,
specify the string followed by one of the pattern-seeking
operators ( and ! ) followed by a regular expression. Use
~ ~
to test whether the string contains a substring that is
~
sought by the regular expression supplied. Use ! to test
~
whether the string does not contain a substring that is
sought by the regular expression supplied.
The following example uses all of these types of operations:
{ if ( NR > 1 && $0 /+/ ) print }
~
In the next line of code, which is equivalent to the one
just given, the operations have been moved into the pattern
area:
$0 /+/ && NR > 1 { print }
~
No operation exists specifically to request conversions
between numbers and strings, or between strings and numbers.
To force an expression to be treated as a number, add 0 to
it; to force it to be treated as a string, concatenate the
null string ("") to it.
Built-in variables
Other variable names with special meanings include
NF the number of fields in the current record
NR the ordinal number of the current record
FNR the ordinal number of the current record relative to
the beginning of the current input file
FILENAME
the name of the current input file
OFS the output field separator (blank by default)
ORS the output record separator (the newline character by
default)
10 January 1992
awk(1) awk(1)
OFMT the output format for numbers (%.6g by default)
ARGC a variable that is set to the total number of command-
line arguments that were offered on the awk command
line
ARGV[]
a built-in array that is set to the command name (awk)
at index 0, the first command-line argument at index 1,
and so on up to the last command-line argument at index
n
Overview of awk Processing and Preprocessing
For each input record, awk performs the ``matched''
pattern-action operations. Thus, the actions that awk
performs usually vary with each input record. The effect is
similar to that of creating a number of different programs,
where each one is a particular accumulation of lines from a
master collection. Each of the accumulated subprograms is
run whenever its triggering records show up in the input
stream, possibly many times over. Through careful selection
of patterns, these subprograms can be closely tailored to
the kind of data that is present in the input record.
When the input data is not already partitioned nicely into
fields and records, the use of preprocessing can be useful
to transform the data into more regular units from which
meaning is more easily extracted. For text data that
already contains field separators, the field values that
indicate variant records are easily detected when they can
be expected at a fixed field location references within
patterns. (See ``Patterns,'' earlier in the ``Description''
section.) For data that is not already subdivided or
regularized, preprocessing with sed or awk is often
desirable so that units of data that affect the meaning of
other units of data can be incorporated into the same
record, or so that independently meaningful units of data
are separated into new records.
When you are combining spans of data into the same record,
it is often desirable to place context-establishing data at
the beginning so that certain patterns can be sought in
certain positions by using the corresponding features of
regular expressions, such as the caret (^).
In cases involving irregular data, the preprocessing concern
of greatest import is the generation of appropriate record
and field boundaries within the data. For instance, each
pass of preprocessing can be designed so that a particular
output field (or a particular record within a set of
records) will be set to an appropriate value for identifying
the context of a certain amount of data. For example, the
January 1992 11
awk(1) awk(1)
nesting of procedures inside braces is more easily unraveled
if the beginning and ending braces always occupy the first
field of an input record, or a dedicated input line.
EXAMPLES
The following command prints lines from the file data that
are longer 72 characters:
awk "length > 72" data
The following command prints the first two fields of each
line in reverse order:
awk '{ print $2, $1 }' filea
prints the first two fields of each line in reverse order.
awk '{ s += $1 }
END {print "sum is", s,
"average is", s/NR }' filea
adds up the first column and prints the sum and average.
awk '{ for (i = NF; i > 0; --i)
print $i }' filea
prints all the fields of each line in reverse order. The
fields are printed one per line in this example.
awk "/start/, /stop/" filea
prints all records between start/stop-pattern pairs for
every such pair in the file.
awk '
$1 > max { max = $1 }
END { print "Max field 1 value=" max }'
prints the maximum value that appears in field 1 of each
input record.
FILES
/bin/awk
Executable file
SEE ALSO
grep(1), lex(1), sed(1)
``awk Reference,'' in A/UX Programming Languages and Tools,
Volume 2
12 January 1992
The awk Programming Language by A.V. Aho, B.W. Kernighan,
and P.J. Weinberger (Reading, MA: Addison-Wesley, 1988)
January 1992 13