Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ awk(1) — AIX PS/2 1.2.1

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

lex

grep, egrep, fgrep

sed



awk, nawk, oawk(1,C)        AIX Commands Reference         awk, nawk, oawk(1,C)



-------------------------------------------------------------------------------
awk, nawk, oawk



PURPOSE

Finds lines in files matching specified patterns and performs specified actions
on them.

SYNTAX

                          +-----------+   +----------+
       +----------+   +-'-|          2|---|         3|-'-+
awk ---|          |---|  ^+- pattern -+   +- action -+|  |--->
       |         1|   |  +----------------------------+  |
       +- -Fchar -+   +----------- -fprogfile -----------+

                                          +------------------+   +--------+
                                      >---|                  |---|        |---|
                                          |                  |   |        |
                                          +- variable=value -+   +- file -+
                                            ^                |     ^      |
                                            +----------------+     +------+
-----------------
1 The default char is a tab.
2 The default pattern is every line.
3 The default action is to print the line.

DESCRIPTION

The awk command is a more powerful pattern matching command than the grep
command.  It can perform limited processing on the input lines, instead of
simply displaying lines that match.  Some of the features of awk are:

  o It performs convenient numeric processing.
  o It allows variables within actions.
  o It allows general selection of patterns.
  o It allows control flow in the actions.
  o It does not require any compiling of programs.

Version 1.2.1 provides an enhanced version of the nawk (for "new awk").  This
new version is similar to that provided by AIX Version 3.1 and recent releases
of AT&T UNIX System V.  The AIX 1.2 awk command is being provided as oawk (for
"old awk").  The awk command is linked to oawk for backwards compatability with
AIX 1.2.

The nawk command provides enhanced handling of files and pipes and better error
messages for debugging.  The nawk command also supports the Japanese language,
which uses the multi-byte character set (MBCS) facilities of AIX 1.2.1.  The
oawk (awk) command has not been enhanced to support MBCS pattern matching.  If
this functionality is required, use nawk.



Processed November 8, 1990   awk, nawk, oawk(1,C)                             1





awk, nawk, oawk(1,C)        AIX Commands Reference         awk, nawk, oawk(1,C)



Differences between oawk and nawk are noted in the text where applicable.
Except where noted, awk implies oawk and nawk in this document.

For a detailed discussion of awk, see AIX Operating System Programming Tools
and Interfaces.

The awk command, reads files in the order stated on the command line.  If you
specify a file name as - (minus) or do not specify a file name, awk reads
standard input.

The awk command searches its input line by line for patterns.  When it finds a
match, it performs the associated action and writes the result to standard
output.  In nawk, the pattern can contain Japanese characters.  Enclose
pattern-action statements on the command line in single quotation marks to
protect them from interpretation by the shell.

The awk command first reads all pattern-action statements, then it reads a line
of input and compares it to each pattern, performing the associated actions on
each match.  When it has compared all patterns to the input line, it reads the
next line.

The awk command treats input lines as fields separated by spaces, tabs, or a
field separator you set with the FS variable.  Fields are referenced as $1, $2,
and so on.  $0 refers to the entire line.

On the awk command line, you can assign values to variables as follows:

variable=value

Pattern-Matching Statements

Pattern-matching statements follow the form:

  pattern       { action }

If a pattern lacks a corresponding action, awk writes the entire line that
contains the pattern to standard output.  If an action lacks a corresponding
pattern, it matches every line.

ACTIONS:  An action is a sequence of statements that follow C Language syntax.
These statements can include:

statement       format
if              if ( conditional ) statement [ else statement ]
while           while ( conditional ) statement
for             for ( expression ; conditional ; expression ) statement

for             for (variable in array) statement1
break           break
continue        continue





Processed November 8, 1990   awk, nawk, oawk(1,C)                             2





awk, nawk, oawk(1,C)        AIX Commands Reference         awk, nawk, oawk(1,C)



close (filename), close (command)
                break connection between print and filename or command (nawk
                only)2
(assignment)    variable = expression
print           print [expression-list] [>expression]
printf          printf format[, expression-list] [>expression] (oawk)
printf          printf format[, expression | 77 expression | | command] (nawk)3

next            next
exit            exit [expression]6
(compound statement)
                {statement...}


__________
1 variable may contain Japanese characters in nawk only.
2 filename and command may contain Japanese characters in nawk only.
3 format, expression-list, and command may contain Japanese characters in nawk
  only.
6 expression may contain Japanese characters in nawk only.

Statements can end with a semicolon, a new-line character , or the right brace
enclosing the action.

If you do not supply an action, awk displays the whole line.  Expressions can
have string or numeric values and are built using the operators "+", "-", "*",
"/", "%", a blank for string concatenation, and the C operators "++", "--",
"+=", "-=", "*=", "/=", and "%=".

In statements, variables may be scalars, array elements (denoted x[i]), or
fields.  Variable names can contain uppercase and lowercase alphabetic letters,
underscores, and digits (0-9).  nawk variable names may contain Japanese
characters.  Variable names cannot begin with a digit.  Variables are
initialized to the null string.  Array subscripts may be any string; they do
not have to be numeric.  This allows for a form of associative memory.  String
constants in expressions should be enclosed in double quotation marks.

There are several variables with special meaning to awk.  They include:

ARGC           Number of command-line arguments.
ARGV           Array of command-line arguments.  argv may contain Japanese
               characters in nawk only.
FILENAME       The name of the current input file.  In nawk only, may contain
               Japanese characters.
FNR            Record number in current file.
FS             Input field separator (default is a blank).  This separator must
               be an ASCII character, in oawk, but may contain Japanese
               characters in nawk.
NF             The number of fields in the current input line (record).
NR             The number of the current input line (record).
OFMT           The output format for numbers (default "%.6g").




Processed November 8, 1990   awk, nawk, oawk(1,C)                             3





awk, nawk, oawk(1,C)        AIX Commands Reference         awk, nawk, oawk(1,C)



OFS            The output field separator (default is a blank).  This separator
               must be an ASCII character, in oawk, but may contain Japanese
               characters in nawk.
ORS            The output record separator (default is a new-line character).
               This separator must be an ASCII character, in oawk, but may
               contain Japanese characters in nawk.
RLENGTH        Length of string matched by match function.
RS             Controls the input record separator (default is \n).
RSTART         Start of string matched by match function.
SUBSEP         Subscript separator (default is \034).

Since the actions process fields, input blanks or white space are not preserved
on the output.

The printf statement formats its expression list according to the format of the
printf subroutine (see AIX Operating System Technical Reference), and writes it
arguments to standard output, separated by the output field separator and
terminated by the output record separator.  You can redirect the output using
the print > "filename" or printf > "filename" statements.  An empty expression
list stands for the whole line.

Example:  To redirect the output of a print statement to a file named "myfile":

  awk '{print > filename}'

or

  awk ' {print > filename}'

You have two ways to designate a character other than white space to separate
fields.  You can use the -Fc flag on the awk command line, or you can start
progfile with:

  BEGIN { FS = c }

Either action changes the field separator to c.

There are several built-in functions that can be used in awk actions.

atan2(y,x)               Takes arctangent of y/x in the range -r to r (nawk
                         only).
cos(x)                   Takes cosine of x, with x in radians (nawk only).
exp(n)                   Takes the exponential of its argument.
getline                  Reads the next line of standard input (oawk).  The
                         nawk version can also read from a pipe or an input
                         file.  An optional var parameter will store the input
                         (nawk only).
gsub(r,s)                Substitute s for r globally in $0, return number of
                         substitutions made (nawk only).
index(s,t)               Return first position of string t in s, or 0 if t is
                         not present.
int(n)                   Takes the integer part of its argument.



Processed November 8, 1990   awk, nawk, oawk(1,C)                             4





awk, nawk, oawk(1,C)        AIX Commands Reference         awk, nawk, oawk(1,C)



length                   Returns the length of the whole line if there is no
                         argument or the length of its argument taken as a
                         string.
log(n)                   Takes the base e logarithm of its argument.
log(x)                   Takes natural (base e) logarithm of x (nawk only).
match(s,r)               Test whether s contains a substring matched by r,
                         return index or 0; sets RSTART and RLENGTH (nawk
                         only).
n=split(s,array,sep)     Splits string s into array [1] ...array [n] and
                         returns number of elements.  If present, sep is the
                         field separator; otherwise, the variable FS is used.
rand( )                  Returns random number r, where 0 <r <l (nawk only).
sin(x)                   Takes sine of x, with x in radians (nawk only).
sqrt(n)                  Takes the square root of its argument.
srand(x)                 (nothing)  x is new seed for rand ( ) (nawk only).
substr(s,m,n)            Returns the substring of s which is n characters long,
                         beginning at position m.
sprintf(fmt,expr,expr,...)
                         Formats the expressions according to the printf format
                         string fmt and returns the resulting string.

PATTERNS:  Patterns are arbitrary Boolean combinations of regular expressions
and relational expressions (the "!", ||, and "&&" operators and parentheses for
grouping).  You must start and end regular expressions with slashes (/).  You
can use regular expressions like those allowed by the egrep command (see "grep,
egrep, fgrep"), including the following special characters:

*      Zero or more occurrences of the pattern.
+      One or more occurrences of the pattern.
?      Zero or one occurrences of the pattern.
|      Either of two statements.
( )    Grouping of expressions.

Isolated regular expressions in a pattern apply to the entire line.  Regular
expressions can occur in relational expressions.  A pattern may consist of two
patterns separated by a comma, in which case the action is performed on all
lines between an occurrence of the first pattern and the next occurrence of the
second.  Regular expressions can contain extended characters with one
exception:  range constructs in character class specifications using square
brackets cannot contain two-byte extended characters.  Individual instances of
extended characters can appear within square brackets; however, two-byte
extended characters are treated as two separate one-byte characters.

There are two types of relational expressions that you can use.  One has the
form:

expression  matchop  regular-expression

where matchop is either:  ~ (for "contains") or !~ (for "does not contain").
The second has the form:

expression  relop  expression



Processed November 8, 1990   awk, nawk, oawk(1,C)                             5





awk, nawk, oawk(1,C)        AIX Commands Reference         awk, nawk, oawk(1,C)




where relop is any of the six C relational operators:  "<", ">", "<=", ">=",
"==", and "!=".  A conditional can be an arithmetic expression, a relational
expression, or a Boolean combination of these.

You can use the special patterns BEGIN and END to capture control before the
first and after the last input line is read, respectively.  BEGIN may only be
the first pattern in profile, and END may only be the last pattern.

There are no explicit conversions between numbers and strings.  To force an
expression to be treated as a number, add "0" to it.  To force it to be treated
as a string, append a null string ("""").

nawk User-Defined Functions

A nawk program can contain user-defined functions.  Such a function is defined
by a statement of the form

  function name(parameter-list)  {
      statements
  }

A function definition can occur anywhere a pattern-action statement can.  Thus,
the general form of a nawk program is a sequence of pattern-action statements
and function definitions separated by newlines or semicolons.

In a function definition, newlines are optional after the left brace and before
the right brace of the function body.  The parameter list is a sequence of
variable names separated by commas; within the body of the function these
variables refer to the arguments with which the function was called.

The body of a function definition may contain a "return" statement that returns
control and perhaps a value to the caller.  It has the form

  return expression

The expression is optional, and so is the "return" statement itself, but the
returned value is undefined if none is provided or if the last statement
executed is not a "return".

For example, a function ""max"" might be called like this:

  { print max($1,max($2,$3)) }  # print maximum of $1, $2, $3

  function max(m, n) {
      return m > n ? m : n
  }

The variables "m" and "n" belong to the function "max"; they are unrelated to
any other variables elsewhere in the program.





Processed November 8, 1990   awk, nawk, oawk(1,C)                             6





awk, nawk, oawk(1,C)        AIX Commands Reference         awk, nawk, oawk(1,C)



Updating oawk scripts to work using nawk

Note this example if you plan to convert your AIX 1.2 oawk scripts to run with
nawk:

With AIX 1.2 awk or AIX 1.2.1 oawk you could write the following awk program:

  {
     LINE = $0   /* set line equal to input string */
     print $LINE  /* print out input */
  }

Note:  $ is only used with indirectly referenced variables.

With AIX 1.2.1 nawk, $ is only used with a number or a variable whose value is
a number, so you would need to write:

  {
     LINE = $0   /* set line equal to input string */
     print LINE   /* print out input */
  }

Notes:  Do not use a $ in front of a variable unless that variable has a value
which is a number.  For example, if var="1" then $var is really $1, which is
the first word on the input line.  However, if var="a" then $var is $(a), which
is not a valid field.

FLAGS

-f  progfile   Searches for the patterns and perform the actions found in the
               file progfile.

-Fchar         Uses char as the field separator character (by default a blank).

EXAMPLES

  1. To display the lines of a file that are longer than 72 characters:

      awk  "length  >72"  chapter1

    This selects each line of the file "chapter1" that is longer than 72
    characters.  awk then writes these lines to standard output because no
    action is specified.

  2. To display all lines between the words "start" and "stop":

      awk  "/start/,/stop/"  chapter1

  3. To run an awk program ("sum2.awk") that processes a file ("chapter1"):

      awk  -f  sum2.awk  chapter1




Processed November 8, 1990   awk, nawk, oawk(1,C)                             7





awk, nawk, oawk(1,C)        AIX Commands Reference         awk, nawk, oawk(1,C)



    The following awk program computes the sum and average of the numbers in
    the second column of the input file:

          {
             sum += $2
          }

      END {
             print "Sum: ", sum;
             print "Average:", sum/NR;
          }

    The first action adds the value of the second field of each line to the
    variable "sum".  awk initializes "sum" (and all variables) to zero before
    starting.  The keyword END before the second action causes awk to perform
    that action after all of the input file has been read.  The variable NR,
    which is used to calculate the average, is a special variable containing
    the number of records (lines) that have been read.

  4. To print the names of the users who have the C shell as the initial shell:

      awk  -F:  '/csh/{print  $1}'  /etc/passwd

  5. To send output to the more command:

      awk  print | more  chapter1

  6. To determine the correct number of characters, words, and lines in
    "chapter1":

      awk  print | wc  chapter1

RELATED INFORMATION

See the following commands:  "lex," "grep, egrep, fgrep" and "sed."

See the printf subroutine in AIX Operating System Technical Reference.

See "Introduction to International Character Support" in Managing the AIX
Operating System.

See the discussion of awk and uawk in AIX Operating System Programming Tools
and Interfaces.












Processed November 8, 1990   awk, nawk, oawk(1,C)                             8



Typewritten Software • bear@typewritten.org • Edmonds, WA 98026