Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ awk(1) — A/UX 3.0.1

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

grep(1)

lex(1)

sed(1)

awk(1)




awk(1) awk(1)
NAME awk - scans a file for lines that match a specific pattern SYNOPSIS awk [-Ffield-separator] 'pattern-action...' [[-v] variable=value]... [file]... awk [-f awk-source-file] [-Ffield-separator] [[-v] variable=value]... [file]... ARGUMENTS -f awk-source-file Specifies the file containing the instruction that awk should interpret. -Ffield-separator Specifies the character to be treated as the field separator when awk parses a record into fields. file Specifies the file or files containing text data to be processed by awk. pattern-action Specifies an awk instruction, which is provided in the form of a pattern followed by an action enclosed in braces: pattern {action} [-v] variable=value Specifies the value of an awk variable that is established for use in the main sections of an awk program, which consists of any number of pattern-action arguments. If the -v option is present, the variable is also available in the BEGIN (initialization) section of an awk program. DESCRIPTION awk effectively handles most programs containing text- parsing, report generation, and record validation tasks. These programs typically contain a brief list of instructions that specify text-scanning and text- manipulation functions. The standard operation of awk is to scan each input file once, looking for matches between each input record and any of a set of patterns that you supply. These pattern instructions are accompanied by action instructions. Sometimes the action instructions merely establish settings that affect text processing that is undertaken by awk as part of its standard operation, such as the parsing of January 1992 1



awk(1) awk(1)
records into fields. So that text patterns can be sought in specific positions in an input record, awk splits the input record into fields at every occurrence of a field-separator character. After an input record is split into fields, each field is assigned to a field variable, such as $1, $2, $3, and so forth. These variables can be used to reference input fields either in the pattern or the action portion of a pattern-action argument. You can obtain a measure of control over the field-parsing function by specifying your own field separator for parsing purposes. The default field separator is white space (tabs or spaces). You can change this separator by making a different assignment to the variable FS, or through the command line by specifying a field-separator character along with the -F option. To ensure that your own field separator takes effect before any input records are parsed into fields, use the -F construct or place the assignment in an action associated with the BEGIN pattern. (See the example at the end of ``Patterns,'' later in the ``Description'' section.) (A regular expression can also be assigned to the FS variable, in which case the field delimiter can be any one of the possible values that match the regular expression.) Although it looks like a field reference, $0 refers to the entire input record, with field delimiters unstripped. For the purposes of documenting syntax, a pattern and its associated actions are considered one pattern-action. As shown in the first syntax description in the ``Synopsis'' section, pattern-action arguments can be supplied directly on the command line. Alternately, you can specify the -f option so that pattern-action arguments can be placed inside of an awk program file, as shown in the second syntax description (see SYNOPSIS). In the latter case, replace awk-source-file with the name of the program file with the awk instructions you want to use. Any time an input record contains a substring that is sought as specified by pattern, awk performs the associated action. The text of an input record that is matched by a pattern can be accessed easily through references to the variables $0, $1, $2, and so forth. Input records can be acted upon immediately or handled less directly. An example of an immediate action is the printing of the contents of a matching input record as soon as it is encountered. An example of a less immediate action is 2 January 1992



awk(1) awk(1)
storing a record in a variable when it is first encountered, then printing it later if later conditions warrant it, such as when the contents of subsequent records invalidate it and an error message is desired. A stored value persists until it is changed by another portion of the same pattern-action or by an entirely different pattern-action. Such assignments permit actions to be gated not only by the text of the input record being scanned but also through the stored text drawn from previous input records. Command-Line Options Either pattern-action arguments are specified inside the awk command lines as shown in the first syntax description line, or they are supplied in a file through specification of file arguments along with the -f option, as shown in the second syntax description line. When pattern-action arguments all appear in the command line, they should be formed into one string enclosed in single quotation marks ('). The quotation marks protect them from being interpreted by the shell. Refer to the awk chapter in A/UX Programming Languages and Tools, Volume 2 for more information about shell and awk cooperation. The level of escapement afforded by the single quotation marks causes any references to shell variables to remain unsubstituted by the shell. To enable their substitution requires the use of awk variables that assign values inside the command line. Variables that are initialized on the command line provide a means of passing parameter values between the shell and awk. The most common use for passed parameters is to access the values of positional variables available from within shell scripts ($1, $2, and so forth). The format of these assignments is similar to that of variable assignments, except that an unescaped space cannot be used on either side of the equal sign, as follows: awk -f awkfile datafile variable1=x variable2=$1 If the parameter assignment is preceded by a -v option, the value so assigned is made available even in the BEGIN (initialization) section of the awk program. Otherwise, the value is not assigned to the variable until after the BEGIN section has been evaluated. Like input files, the passed parameters are also evaluated in the order in which they appear: Passed parameters that are specified after an input file will not be available January 1992 3



awk(1) awk(1)
while the system is processing that input file. Passed parameters that are specified before any number of input files will be available when processing those input files. If no input file is specified, the standard input is read until exhausted. When several input files are specified, they are read in the order in which they are specified. If the shorthand notation for standard input (-) is specified as one of several file arguments, the standard input is also read in the order in which it is specified. Patterns The pattern portion of a pattern-action argument often involves the scanning of text for occurrences of a particular text pattern. These patterns are specified through a pattern-seeking template, better known as a regular expression. For a more detailed explanation of regular expressions, refer to ed(1). Regular expressions must be surrounded by slashes. The format for a regular expression is /character-col1... character-colN/ where character-col1 through character-colN represent the first through last characters to seek before a substring is considered ``matched.'' Besides supplying a normal character to replace character-col1 and other character positions, you can use a special or wildcard character, such as the period, which matches any character at that position. An asterisk matches any number of any characters from that position onward. Other special characters are the caret (^) and dollar sign ($), which ``match'' the beginning of a line and the end of a line, respectively. The only sensible place to insert the caret is at the beginning of pattern. Likewise, the only sensible place to insert the dollar sign is at the very end of pattern. Besides supplying a single character to replace character-col1 and other character positions, you can supply a character range or a character list enclosed in brackets. Thus, /^[A-Z][aeiou]/ evaluates as true for all input records that start with an uppercase character followed by a vowel. The pattern portion of the pattern-action argument can be any expression, including ones that do not involve pattern- 4 January 1992



awk(1) awk(1)
seeking. For example, $1 > 0 { print } is a valid pattern-action argument that prints all input records with a first field that is greater than 0. Pattern expressions often test for the presence of certain text patterns, either within the entire input record or within one or more fields in an input record. Field-scoped searches require one of the ``pattern-seeking'' operators and a regular expression, as follows: $0 /Employee/ { action... }
~
$3 /Employee/ { action... }
~
If you search the entire input record for matching strings, you do not have to supply the $0 portion of the line,
~
since this portion will be assumed when a regular expression
is supplied by itself as the pattern. This convention makes the following patterns equivalent: $0 /Employee/
~
/Employee/
To seek a contiguous set of input records starting from a record that matches pattern1 and ending with a record that matches pattern2, specify two regular expressions separated by a comma, as follows: /pattern1/,/ pattern2/ { action... } The action is performed for all input records between an occurrence of the first pattern and the next occurrence of the second pattern. The special patterns BEGIN and END can be used to establish actions to be taken before the first input record is read and after the input stream is exhausted. For example, a tab can be made the field separator (exclusively) with BEGIN { FS = "\t" } Actions A pattern-action argument has the form pattern { action } A missing {action} argument triggers the printing of matching input records; a missing pattern argument causes January 1992 5



awk(1) awk(1)
the associated action to be performed for every input record (as if every input record matched the missing pattern). An action argument is a sequence of statements. A statement can be one of the following code fragments: if ( conditional ) statement [ else statement ] while ( conditional ) statement for ( expression ; conditional ; expression ) statement break continue { [ statement ]... } variable = expression next exit Statements are terminated by semicolons, newline characters, or right braces. Expressions take on string or numeric values as appropriate and are built with the operators +, -, *, /, %, and ``concatenation'' (indicated by a blank). The C operators ++, --, +=, -=, *=, /=, and %= are also available in expressions. Variables can be scalars, array elements (denoted x[i]), or fields. Variables are initialized to the null string. Array subscripts can be any string, including strings generated automatically when numeric expressions are used as subscripts. String constants are enclosed in double quotation marks ("). The next and exit functions affect control flow. Use exit to terminate processing without any further actions. Use next to terminate any remaining actions that would have been gated for the current input record, skipping to the beginning of the current awk-source-file so that processing can continue with the next input record. Output Functions The output functions include the following statements: print [expression] [[,] expression]... printf(format-string, expr [, expr]...) Both of these statements can print to files as well as the standard output, as described by the more general syntax print-command [>file] Use the print statement to print the results of expression arguments followed by the output record separator character given by the variable ORS. If print is specified without any accompanying arguments, the entire input record is printed. If several expressions are supplied, separated by commas, the result of each expression is printed, separated 6 January 1992



awk(1) awk(1)
by the output field separator given by the variable OFS. See ``Built-in Variables'' later in the ``Description'' section for more built-in variables. Use the printf statement to format and print the result of expr arguments in accordance with format-string (see printf(3S)). Another way to place data on the awk output stream is to use the system function system(expression) In this case, expression must compute to a valid shell command so that it can be executed outside the context of awk. Any output resulting from the execution of the command is inserted into the output of awk. This function returns the exit status for the command so that you can test for successful execution by testing for a 0 exit value. (This is the case for most, but not all, commands.) Input Functions Besides being supplied as command-line arguments, multiple input files are supported through the getline function. This record-reading function can be one of the actions associated with a BEGIN or an END pattern, as well as any other patterns. A typical use is to associate this action with the BEGIN pattern to initialize the contents of an array from static data stored in an external file. Since the return value is 1 as long as the input file is not exhausted, you can use the following code fragment to establish the file table: BEGIN { while ( getline array[count] <"table" > 0 ) { count = count + 1 } } . . . This command can be specified in any of four different forms: getline getline variable getline <file getline variable <file The first form reads the next input record. Unlike the next statement, with this form control remains at the place where getline occurs within the current pattern-action argument and proceeds to any pattern-action arguments that follow, until the end-of-file character is reached. January 1992 7



awk(1) awk(1)
The second form behaves in the same way except that certain variables ($0, $1, and so forth) are not reset and the content of the input record is assigned to variable unstripped of field separators. The third and fourth forms are the same as the first and second forms except that the input record is read from file. If file is an explicit reference to a file, enclose it in quotation marks to make it a string constant. (Otherwise it is likely to be interpreted as a variable that is dynamically initialized to an empty string.) To switch between many different input files, use the close(file) function before opening any new files for reading. Other String Functions Here are the built-in functions for strings: index(string1,string2) Returns the index at which string2 first occurs inside string1 or 0 if there is no match. length(string) Returns the length of its argument taken as a string, or of the whole input record if no argument is supplied. match(string,pattern) Returns the index at which the regular expression pattern first occurs inside string while setting the variables RSTART and RLENGTH. Returns 0 if there is no match. split(string,array,separator) Splits string into fields that are assigned to elements in array with subscripts 1, 2, and so on. A new field is created at each occurrence of separator within string. It returns the number of fields that were parsed. substr(string,position,length) Returns the length-character substring of string that begins at position position. sprintf(format-string,expr[,expr]...) Formats expressions in accordance with format-string (described in printf(3S)), returning the resulting string. sub(pattern,replacement[,variable]) gsub(pattern,replacement[,variable]) Performs text substitution (search-and-replace) functions either for the first matched substring (sub) 8 January 1992



awk(1) awk(1)
or globally for every matched substring (gsub). Number Functions Here are the built-in functions for numbers: atan2(y,x) Returns the arctangent of y/x in radians in the range -π to π. cos(radians) Returns the cosine of the angle measure. exp(power) Returns e raised to the power power. int(real) Truncates real, returning an integer. log(x) Returns the natural logarithm of x. rand() Returns a pseudo-random number between 0 and 1. srand([seed]) Sets the seed for the random number generator to seed or to the time of day if seed is missing. sin(radians) Returns the cosine of the angle measure. sqrt(x) Returns the square root of x. User-Defined Functions User functions can be called just as built-in functions are, once they are declared with function name(arg...) { body } Within body, the function return(expression) can be used to cause the user function to return the value of the supplied expression. Expressions This discussion of expressions applies within action statements and within patterns. Only certain action statements can include expressions; refer to ``Actions,'' earlier in the ``Description'' section for more information. Parentheses can be used to establish operation precedence for expressions containing several operators. January 1992 9



awk(1) awk(1)
Expressions can be string or number constants, variables, or field references as well as combinations of these joined by equal (==), not equal (!=), greater-than (>), less-than (<), greater-than-equal (>=), and less-than-equal (<=). Because they produce Boolean results (true or false), two or more of the preceding comparison operations can be related by means of Boolean operators: logical AND (&&), logical OR (||), and NOT (!). To test for the existence of various substrings in a string, specify the string followed by one of the pattern-seeking operators ( and ! ) followed by a regular expression. Use
~ ~
to test whether the string contains a substring that is
~
sought by the regular expression supplied. Use ! to test
~
whether the string does not contain a substring that is
sought by the regular expression supplied. The following example uses all of these types of operations: { if ( NR > 1 && $0 /+/ ) print }
~
In the next line of code, which is equivalent to the one just given, the operations have been moved into the pattern area: $0 /+/ && NR > 1 { print }
~
No operation exists specifically to request conversions between numbers and strings, or between strings and numbers. To force an expression to be treated as a number, add 0 to it; to force it to be treated as a string, concatenate the null string ("") to it. Built-in variables Other variable names with special meanings include NF the number of fields in the current record NR the ordinal number of the current record FNR the ordinal number of the current record relative to the beginning of the current input file FILENAME the name of the current input file OFS the output field separator (blank by default) ORS the output record separator (the newline character by default) 10 January 1992



awk(1) awk(1)
OFMT the output format for numbers (%.6g by default) ARGC a variable that is set to the total number of command- line arguments that were offered on the awk command line ARGV[] a built-in array that is set to the command name (awk) at index 0, the first command-line argument at index 1, and so on up to the last command-line argument at index n Overview of awk Processing and Preprocessing For each input record, awk performs the ``matched'' pattern-action operations. Thus, the actions that awk performs usually vary with each input record. The effect is similar to that of creating a number of different programs, where each one is a particular accumulation of lines from a master collection. Each of the accumulated subprograms is run whenever its triggering records show up in the input stream, possibly many times over. Through careful selection of patterns, these subprograms can be closely tailored to the kind of data that is present in the input record. When the input data is not already partitioned nicely into fields and records, the use of preprocessing can be useful to transform the data into more regular units from which meaning is more easily extracted. For text data that already contains field separators, the field values that indicate variant records are easily detected when they can be expected at a fixed field location references within patterns. (See ``Patterns,'' earlier in the ``Description'' section.) For data that is not already subdivided or regularized, preprocessing with sed or awk is often desirable so that units of data that affect the meaning of other units of data can be incorporated into the same record, or so that independently meaningful units of data are separated into new records. When you are combining spans of data into the same record, it is often desirable to place context-establishing data at the beginning so that certain patterns can be sought in certain positions by using the corresponding features of regular expressions, such as the caret (^). In cases involving irregular data, the preprocessing concern of greatest import is the generation of appropriate record and field boundaries within the data. For instance, each pass of preprocessing can be designed so that a particular output field (or a particular record within a set of records) will be set to an appropriate value for identifying the context of a certain amount of data. For example, the January 1992 11



awk(1) awk(1)
nesting of procedures inside braces is more easily unraveled if the beginning and ending braces always occupy the first field of an input record, or a dedicated input line. EXAMPLES The following command prints lines from the file data that are longer 72 characters: awk "length > 72" data The following command prints the first two fields of each line in reverse order: awk '{ print $2, $1 }' filea prints the first two fields of each line in reverse order. awk '{ s += $1 } END {print "sum is", s, "average is", s/NR }' filea adds up the first column and prints the sum and average. awk '{ for (i = NF; i > 0; --i) print $i }' filea prints all the fields of each line in reverse order. The fields are printed one per line in this example. awk "/start/, /stop/" filea prints all records between start/stop-pattern pairs for every such pair in the file. awk ' $1 > max { max = $1 } END { print "Max field 1 value=" max }' prints the maximum value that appears in field 1 of each input record. FILES /bin/awk Executable file SEE ALSO grep(1), lex(1), sed(1) ``awk Reference,'' in A/UX Programming Languages and Tools, Volume 2 12 January 1992



awk(1) awk(1)
The awk Programming Language by A.V. Aho, B.W. Kernighan, and P.J. Weinberger (Reading, MA: Addison-Wesley, 1988) January 1992 13

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026