Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ awk(1) — AIX/RT 2.2.1

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

lex

grep

sed

awk

PURPOSE

     Finds lines in files matching specified patterns and per-
     forms specified actions on them.

SYNOPSIS
     awk [ -Fc ] [ prog|-f progfile ] [ parameters ] [ files ]


DESCRIPTION

     The  awk  command is  a  more  powerful pattern  matching
     command than  the grep  command.  It can  perform limited
     processing  on the  input lines,  instead of  simply dis-
     playing lines  that match.  Some  of the features  of awk
     are:

     o   It can perform convenient numeric processing.
     o   It allows variables within actions.
     o   It allows general selection of patterns.
     o   It allows control flow in the actions.
     o   It does not require any compiling of programs.

     For  a  detailed discussion  of  awk,  see AIX  Operating
     System Programming Tools and Interfaces.

     The awk command,  reads files in the order  stated on the
     command line.  If you specify a file name as - (minus) or
     do not specify a file name, awk reads standard input.

     The  awk command  searches  its input  line  by line  for
     patterns. When it finds a  match, it performs the associ-
     ated  action and  writes the  result to  standard output.
     Enclose pattern-action statements on  the command line in
     single quotation  marks to protect them  from interpreta-
     tion by the shell.

     The  awk command  first reads  all pattern-action  state-
     ments, then it  reads a line of input and  compares it to
     each pattern,  performing the associated actions  on each
     match.  When  it has compared  all patterns to  the input
     line, it reads the next line.

     The awk command treats input lines as fields separated by
     spaces, tabs,  or a field  separator you set with  the FS
     variable.  Fields  are referenced as  $1, $2, and  so on.
     $0 refers to the entire line.

     On the awk  command line, you can assign  values to vari-
     ables as follows:

     variable=value

       Pattern-Matching Statements

     Pattern-matching statements follow the form:

       pattern       { action }

     If a pattern lacks a corresponding action, awk writes the
     entire line that contains the pattern to standard output.
     If an  action lacks  a corresponding pattern,  it matches
     every line.

     ACTIONS:   An action  is  a sequence  of statements  that
     follow C Language syntax.  These statements can include:

     statement   format
     if          if ( conditional ) statement [ else statement
                 ]
     while       while ( conditional ) statement
     for         for ( expression ; conditional ; expression )
                 statement
     break
     continue
     { statement . . .  }
     (assignment) variable=expression
     print       print [expression-list] [>expression]
     printf      printf        format[,       expression-list]
                 [>expression]
     next
     exit

     Statements can end with a semicolon, a new-line character
     , or the right brace enclosing the action.

     If you  do not supply  an action, awk displays  the whole
     line.  Expressions can have  string or numeric values and
     are built using the operators  "+", "-", "*", "/", "%", a
     blank for string concatenation, and the C operators "++",
     "--", "+=", "-=", "*=", "/=", and "%=".

     In statements,  variables may be scalars,  array elements
     (denoted x[i]) or fields.   Variable names may consist of
     upper- and  lowercase alphabetic letters,  the underscore
     character,  the digits  (0-9),  and extended  characters.
     Variable names cannot begin  with a digit.  Variables are
     initialized to the null  string.  Array subscripts may be
     any string; they do not  have to be numeric.  This allows
     for a  form of  associative memory.  String  constants in
     expressions should be enclosed in double quotation marks.

     There are several variables  with special meaning to awk.
     They include:

     FS        Input  field separator  (default  is a  blank).
               This separator  character cannot be  a two-byte
               extended character.
     NF        The number of fields  in the current input line
               (record).
     NR        The number of the current input line (record).
     FILENAME  The name of the current input file.
     OFS       The  output  field   separator  (default  is  a
               blank).  This  separator character cannot  be a
               two-byte extended character.
     ORS       The output record separator  (default is a new-
               line  character).    This  separator  character
               cannot be a two-byte extended character.
     OFMT      The output format for numbers (default "%.6g").

     Since the  actions process  fields, input white  space is
     not preserved on the output.

     The   printf  statement   formats  its   expression  list
     according to the format of the printf subroutine (see AIX
     Operating  System  Technical  Reference), and  writes  it
     arguments  to standard  output, separated  by the  output
     field separator and terminated by the output record sepa-
     rator.  You can redirect the output using the print> file
     or printf> file statements.

     You have  two ways  to designate  a character  other than
     white space to separate fields.  You can use the -Fc flag
     on the awk command line, or you can start progfile with:

       BEGIN { FS = c }

     Either action changes the field separator to c.

     There are several built-in functions  that can be used in
     awk actions.

     length                         Returns the  length of the
                                    whole line if  there is no
                                    argument or  the length of
                                    its  argument  taken as  a
                                    string.
     exp(n)                         Takes  the exponential  of
                                    its argument.
     log(n)                         Takes the base e logarithm
                                    of its argument.
     sqrt(n)                        Takes  the square  root of
                                    its argument.
     int(n)                         Takes the  integer part of
                                    its argument.
     substr(s,m,n)                  Returns  the  substring  n
                                    characters   long  of   s,
                                    beginning at position m.
     sprintf(fmt,expr,expr, . . . ) Formats   the  expressions
                                    according  to  the  printf
                                    format   string  fmt   and
                                    returns    the   resulting
                                    string.

     PATTERNS:  Patterns are arbitrary Boolean combinations of
     patterns  and relational  expressions (the  "!", ||,  and
     "&&" operators  and parentheses for grouping).   You must
     start and  end patterns  with slashes  (/).  You  can use
     regular  expressions  like  those allowed  by  the  egrep
     command  (see "grep"),  including  the following  special
     characters:

     +         One or more occurrences of the pattern.
     ?         Zero or one occurrences of the pattern.
     |         Either of two statements.
     ( )       Grouping of expressions.

     Isolated patterns in a pattern  apply to the entire line.
     Patterns  can occur  in relational  expressions.  If  two
     patterns are  separated by  a comma,  the action  is per-
     formed on  all lines between  an occurrence of  the first
     pattern and  the next occurrence of  the second.  Regular
     expressions  can  contain  extended characters  with  one
     exception:  range constructs  in character class specifi-
     cations  using square  brackets  cannot contain  two-byte
     extended  characters.  Individual  instances of  extended
     characters  can appear  within square  brackets; however,
     two-byte extended characters are  treated as two separate
     one-byte characters.  Regular  expressions can also occur
     in relational expressions.

     There are  two types  of relational expressions  that you
     can use.  One has the form:

     expression  matchop  pattern

     where matchop  is either: ~  (for "contains") or  !~ (for
     "does not contain").  The second has the form:

     expression  relop  expression

     where relop  is any  of the  six C  relational operators:
     "<", ">", "<=", ">=", "==",  and "!=".  A conditional can
     be an arithmetic expression,  a relational expression, or
     a Boolean combination of these.

     You can use the special patterns BEGIN and END to capture
     control before the first and after the last input line is
     read,  respectively.  You  can  only  use these  patterns
     before the first and after the last line in progfile.

     There  are no  explicit conversions  between numbers  and
     strings.   To force  an  expression to  be  treated as  a
     number, add  "0" to it.  To  force it to be  treated as a
     string, append a null string ("""").

FLAGS

     -f  progfile  Searches for  the patterns and  perform the
                   actions found in the file progfile.
     -Fchar        Uses char as  the field separator character
                   (by default a blank).

EXAMPLES

     1.  To display the  lines of a file that  are longer than
         72 characters:

           awk  "length  >72"  chapter1

         This selects each line of the file "chapter1" that is
         longer than  "72" characters.  awk then  writes these
         lines to standard output  because no action is speci-
         fied.
     2.  To display  all lines  between the words  "start" and
         "stop":

           awk  "/start/,/stop/"  chapter1

     3.  To run an awk program ("sum2.awk" .) that processes a
         file ("chapter1"):

           awk  -f  sum2.awk  chapter1

         The  following  awk  program  computes  the  sum  and
         average of  the numbers in  the second column  of the
         input file:

               {
                  sum += $2
               }

           END {
                  print "Sum: ", sum;
                  print "Average:", sum/NR;
               }

         The first action  adds the value of  the second field
         of each line to  the variable "sum".  awk initializes
         "sum" (and  all variables)  to zero  before starting.
         The keyword  END before the second  action causes awk
         to perform  that action after  all of the  input file
         has been  read.  The  variable NR,  which is  used to
         calculate  the average,  is a  special variable  con-
         taining the number of  records (lines) that have been
         read.
     4.  To print the names of the  users who have the C shell
         as the initial shell:

           awk  -F:  '/csh/{print  $1}'  /etc/passwd

RELATED INFORMATION

     The following commands:  "lex,"  "grep," and "sed."

     The printf  subroutine in AIX Operating  System Technical
     Reference.

     The "Overview of International Character Support" in Man-
     aging the AIX Operating System.

     The discussion of awk in AIX Operating System Programming
     Tools and Interfaces.

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026