expressions(5) expressions(5)
NAME
expressions - regular expressions
DESCRIPTION
Regular expressions are used for scanning a text for strings which
match a defined pattern. A regular expression stands for a set of
characters or character strings. Each character string in this set is
said to be matched by the regular expression. A pattern is constructed
from one or more regular expressions.
A regular expression comprises a string of characters, which can be
further classified into:
- ordinary characters
All characters in the character set, except for the newline charac-
ter and metacharacters, are ordinary characters. Within a pattern,
ordinary characters match themselves, i.e. the pattern abc will
match only those strings that contain the character sequence abc
anywhere in them.
- metacharacters
Metacharacters do not match themselves, but have a special meaning,
which is explained below. Metacharacters preceded by a backslash \
lose their special meaning.
There are two forms of regular expression:
- simple regular expressions
- extended regular expressions
The syntax of these forms of regular expression is described in the
following sections.
Page 1 Reliant UNIX 5.44 Printed 11/98
expressions(5) expressions(5)
SIMPLE REGULAR EXPRESSIONS
The pattern to be searched for in a text can be made up of any single
expressions. The following single expressions can be used if a command
supports simple regular expressions.
Single characters, collating elements
______________________________________________________________________
| Expression | Meaning |
|_______________|_____________________________________________________|
| c | The character c, where c must not be a special |
| | character (metacharacter). |
| | |
| | Example: a matches a |
|_______________|_____________________________________________________|
| \c | The character c, where c can be any character other|
| | than ( ) { } 1 2 3 4 5 6 7 8 9. |
| | |
| | A regular expression in the form \c is meaningful |
| | if c is a metacharacter. \c then stands for the |
| | character c itself. |
| | |
| | Example: \a matches a, \* matches * |
|_______________|_____________________________________________________|
| [.cc.] | (Collating symbol; only within [ ]) Multi-character|
| | collating elements must be represented in this form|
| | to distinguish them from ordinary characters. An |
| | expression of this type is collated as a single |
| | character. cc has to be defined as a valid collat- |
| | ing element in the internationalized environment. |
| | |
| | Example: |
| | |
| | In the Spanish locale LANG=LCCOLLATE=EsSP.88591 |
| | ch is a valid collating element: in Spanish, ch is |
| | treated as a single letter collating between c and |
| | d. This letter must be represented in the form |
| | [.ch.] to distinguish it from the two-letter string|
| | ch. |
|_______________|_____________________________________________________|
Page 2 Reliant UNIX 5.44 Printed 11/98
expressions(5) expressions(5)
Groups of characters, classes
______________________________________________________________________
| Expression | Meaning |
|_______________|_____________________________________________________|
| . | Any character. |
| | |
| | Example: . matches a, x, *, ... |
|_______________|_____________________________________________________|
| [s] | Any character from the character string s. s can |
| | also be a character class. |
| | |
| | Example: [mz] matches m, z |
| | |
| | Warning: |
| | |
| | Metacharacters that have a special meaning in |
| | bracketed expressions (], -, ^) are treated as nor-|
| | mal characters if they are placed at a particular |
| | position in the bracketed expression, i.e. |
| | |
| | ] at the first position |
| | |
| | - at the first or last position |
| | |
| | ^ at any position except the first or last. |
| | |
| [c1-c2] | Any character in the range between c1 and c2 in |
| | accordance with the currently valid collating |
| | sequence (c1 and c2 inclusive). c1 and c2 can also |
| | be expressions for equivalence classes [=c=] or |
| | collating symbols [.cc.]. |
| | |
| | Example: |
| | |
| | In the German locale LANG=LCCOLLATE=DeDE.88591 |
| | [a-d] matches the characters a, ä, b, c, d, while |
| | in the Spanish locale LANG=LCCOLLATE=EsSP.88591, |
| | it matches the characters a, b, c, ch, d. |
| | |
| [s1c1-c2s2] | The two forms can be combined. |
|_______________|_____________________________________________________|
Page 3 Reliant UNIX 5.44 Printed 11/98
expressions(5) expressions(5)
______________________________________________________________________
| [^s] | Any character not contained in the character string|
| | s. |
| | |
| | Example: |
| | |
| | [^xyz] matches every character excluding x, y, z. |
| | |
| [^c1-c2] | Any character not in the range between c1 and c2. |
| | |
| | Example: |
| | |
| | [^0-9] matches every character excluding 0, 9 and |
| | the characters in the collating sequence between 0 |
| | and 9. |
| | |
| [^s1c1-c2s2] | The two forms can be combined. |
|_______________|_____________________________________________________|
| [:class:] | (Character class expression; only within [ ]) Any |
| | character from the character class class. class can|
| | be: |
| | |
| | alpha any letter |
| | |
| | upper any uppercase letter |
| | |
| | lower any lowercase letter |
| | |
| | digit any decimal digit (0 through 9) |
| | |
| | xdigit any hexadecimal digit (0 through 9, |
| | a through f and A through F) |
| | |
| | alnum any alphanumeric character |
| | (letters and digits) |
| | |
| | space any character producing white space |
| | in displayed text |
| | (e.g. blanks or tabs) |
| | |
| | blank blanks or tabs |
| | |
| | punct any separator |
| | |
| | print any printable character (including |
| | the characters in space) |
| | |
| | graph any printable character with a visible |
| | representation (excluding the characters |
| | in space) |
|_______________|_____________________________________________________|
Page 4 Reliant UNIX 5.44 Printed 11/98
expressions(5) expressions(5)
| | cntrl any control character |
| | |
| | Example: |
| | |
| | In the German locale LANG=LCCTYPE=DeDE.88591 the |
| | characters ä, ö, ü, ß, a, ..., z match the regular |
| | expression [[:lower:]]. The character è does not |
| | belong to lower or alpha, but is a printable char- |
| | acter. |
|_______________|_____________________________________________________|
| [=c=] | (Equivalence class expression; only within [ ]) Any|
| | character or collating element defined as having |
| | the same relative order as c. c must not be an |
| | equals sign = or a right square bracket ]. |
| | |
| | Example: |
| | |
| | In the German locale LANG=LCCOLLATE=DeDE.88591 |
| | the characters u and ü form an equivalence class. |
| | Consequently the characters u and ü match the regu-|
| | lar expression [[=u=]]. In this locale, the regular|
| | expressions [[=u=]v], [[=ü=]v], and [uüv] are |
| | synonyms. |
|_______________|_____________________________________________________|
Concatenation
Single expressions can be concatenated. All concatenated expressions
together describe the pattern to be searched for in a text.
______________________________________________________________________
| Expression | Meaning |
|_______________|_____________________________________________________|
| rx | An occurrence of a character string matching the |
| | regular expression r, followed by a character |
| | string matching the regular expression x. |
| | |
| | Example: [ab]. matches ax, a3, a*, bz, ... |
|_______________|_____________________________________________________|
Page 5 Reliant UNIX 5.44 Printed 11/98
expressions(5) expressions(5)
Repeats
Expressions that describe single characters or groups of characters
can be repeated, as can references back to subexpressions.
______________________________________________________________________
| Expression | Meaning |
|_______________|_____________________________________________________|
| r* | Zero, one, or more occurrences of the regular |
| | expression r. |
| | |
| | Example: a* matches nothing, a, aa, aaa, ... |
|_______________|_____________________________________________________|
| r\{m,n\} | At least m and at most n occurrences of the regular|
| | expression r. |
| | |
| | Example: a\{1,2\} matches a or aa |
| | |
| r\{m\} | Exactly m occurrences of the regular expression r. |
| | |
| | Example: a\{3\} matches aaa |
| | |
| r\{m,\} | At least m occurrences of the regular expression r.|
| | |
| | Example: a\{3,\} matches aaa, aaaa, aaaaa, ... |
|_______________|_____________________________________________________|
Page 6 Reliant UNIX 5.44 Printed 11/98
expressions(5) expressions(5)
Anchoring
Patterns can be "anchored" at the start of the end of a line.
______________________________________________________________________
| Expression | Meaning |
|_______________|_____________________________________________________|
| ^r | A character string appearing at the start of a |
| | line, that matches the regular expression r, i.e. |
| | straight after a newline character or at the start |
| | of a file. |
| | |
| | Example: |
| | |
| | ^[aA]pple matches apple or Apple at the start of a |
| | line. |
|_______________|_____________________________________________________|
| r$ | A character string appearing at the end of a line, |
| | that matches the regular expression r, i.e. |
| | directly before a newline character. |
| | |
| | Example: |
| | |
| | [bB]irne$ matches barge or Barge at the end of a |
| | line. |
|_______________|_____________________________________________________|
Page 7 Reliant UNIX 5.44 Printed 11/98
expressions(5) expressions(5)
Subexpressions and references
Parts of a pattern can be combined as a subexpression. This subexpres-
sion can then be repeated at a later position in the pattern by means
of a reference. The reference always stands for the same character
string as the subexpression.
______________________________________________________________________
| Expression | Meaning |
|_______________|_____________________________________________________|
| \(x\) | The regular expression x is identified as a subex- |
| | pression. It matches all character strings that |
| | match the regular expression x. |
| | |
| | Example: \(aa*\) matches a, aa, aaa, ... |
|_______________|_____________________________________________________|
| \n | n is an integer between 1 and 9. \n is reference to|
| | the nth subexpression x in a pattern. x must be |
| | placed before the reference in the pattern. \n |
| | matches the same character string as x. |
| | |
| | Example: |
| | |
| | \(aa*\)\1 matches aa, aaaa, aaaaaa, ... |
| | |
| | \(a\(b\)\)\2 matches abb |
|_______________|_____________________________________________________|
Grouping, alternatives
Only exist for extended regular expressions.
Page 8 Reliant UNIX 5.44 Printed 11/98
expressions(5) expressions(5)
EXTENDED REGULAR EXPRESSIONS
The pattern to be searched for in a text can be made up of any single
expressions. The following single expressions can be used if a command
supports extended regular expressions.
Single characters, collating elements
As for simple regular expressions.
Groups of characters, classes
As for simple regular expressions
Concatenation
As for simple regular expressions.
Page 9 Reliant UNIX 5.44 Printed 11/98
expressions(5) expressions(5)
Repeats
Expressions that describe single characters or groups of characters
can be repeated, as can groupings and alternatives.
______________________________________________________________________
| Expression | Meaning |
|_______________|_____________________________________________________|
| r* | Zero, one, or more occurrences of the regular |
| | expression r. |
| | |
| | Example: a* matches nothing, a, aa, aaa, ... |
| | |
| r+ | One or more occurrences of the regular expression |
| | r. |
| | |
| | Example: u+ matches u, uu, uuu, ... |
| | |
| r? | Zero or one occurrences of the regular expression |
| | r. |
| | |
| | Example: u? matches nothing or u |
|_______________|_____________________________________________________|
| r{m,n} | At least m and at most n occurrences of the regular|
| | expression r. |
| | |
| | Example: a{1,2} matches a or aa |
| | |
| r{m} | Exactly m occurrences of the regular expression r. |
| | |
| | Example: a{3} matches aaa |
| | |
| r{m,} | At least m occurrences of the regular expression r.|
| | |
| | Example: a{3,} matches aaa, aaaa, aaaaa, ... |
|_______________|_____________________________________________________|
Anchoring
As for simple regular expressions.
Subexpressions and references
Only exist for simple regular expressions.
Page 10 Reliant UNIX 5.44 Printed 11/98
expressions(5) expressions(5)
Grouping, alternatives
______________________________________________________________________
| Expression | Meaning |
|_______________|_____________________________________________________|
| (rx) | The regular expressions r and x are combined in a |
| | group that matches all character strings matching |
| | the regular expression rx. |
| | |
| | Example: |
| | |
| | (ok(abc)) matches okabc |
| | |
| | (au)* matches nothing or au, auau, ... |
|_______________|_____________________________________________________|
| (r1|r2) | Character strings that match the regular expression|
| | r1 or r2. |
| | |
| | Example: (ok|ko) matches ok or ko |
|_______________|_____________________________________________________|
Page 11 Reliant UNIX 5.44 Printed 11/98
expressions(5) expressions(5)
PRECEDENCE
The following tables show the precedence of operators in regular
expressions. The operators are collated in descending order from
highest precedence to lowest precedence.
Precedence for simple regular expressions
______________________________________________________________________
| Symbols from the internationalized environment | [= =] [: :] [. .] |
|________________________________________________|____________________|
| Quoting characters | \character |
|________________________________________________|____________________|
| Bracketed expressions | [ ] |
|________________________________________________|____________________|
| Subexpressions, references | \( \) \n |
|________________________________________________|____________________|
| Repeat | * \{m,n\} |
|________________________________________________|____________________|
| Concatenation | rx |
|________________________________________________|____________________|
| Anchoring | ^ $ |
|________________________________________________|____________________|
Page 12 Reliant UNIX 5.44 Printed 11/98
expressions(5) expressions(5)
Precedence for extended regular expressions
______________________________________________________________________
| Symbols from the internationalized environment | [= =] [: :] [. .] |
|________________________________________________|____________________|
| Quoting characters | \character |
|________________________________________________|____________________|
| Bracketed expressions | [ ] |
|________________________________________________|____________________|
| Grouping | ( ) |
|________________________________________________|____________________|
| Repeat | * ? + {m,n} |
|________________________________________________|____________________|
| Concatenation | rx |
|________________________________________________|____________________|
| Anchoring | ^ $ |
|________________________________________________|____________________|
| Alternatives | | |
|________________________________________________|____________________|
Page 13 Reliant UNIX 5.44 Printed 11/98
expressions(5) expressions(5)
Commands with regular expressions
The following table is an overview of the commands that process regu-
lar expressions.
_________________________________________
| Command | Type of regular expressions |
|_________|______________________________|
| apropos | extended |
|_________|______________________________|
| awk | extended internationalized |
|_________|______________________________|
| bfs | simple |
|_________|______________________________|
| csplit | simple |
|_________|______________________________|
| e | simple |
|_________|______________________________|
| ed | simple |
|_________|______________________________|
| egrep | extended |
|_________|______________________________|
| ex | simple |
|_________|______________________________|
| expr | simple |
|_________|______________________________|
| extract | simple |
|_________|______________________________|
| findman | extended |
|_________|______________________________|
| grep | simple |
|_________|______________________________|
| lex | extended |
|_________|______________________________|
| man | simple |
|_________|______________________________|
| nl | simple |
|_________|______________________________|
| pg | simple |
|_________|______________________________|
| ed | simple |
|_________|______________________________|
| vi | simple |
|_________|______________________________|
| whatis | extended |
|_________|______________________________|
Page 14 Reliant UNIX 5.44 Printed 11/98
expressions(5) expressions(5)
LOCALE
In bracketed regular expressions the LCCOLLATE environment variable
determines the meaning of metacharacters, equivalence classes, and
collating elements, while the LCCTYPE environment variable determines
the meaning of character classes.
If LCCOLLATE or LCCTYPE is undefined or defined as a null string,
the value of LANG is taken as the default value for the unset or empty
variable. If LANG is also undefined or defined as a null string, the
system behaves as if it has not been internationalized.
If one of the variables has a value that is invalid for the interna-
tionalized environment, the system behaves as if no variables have
been set.
The LCALL environment variable determines the entire international-
ized environment. LCALL takes precedence over all other environment
variables in the area of internationalization.
SEE ALSO
awk(1), bfs(1), csplit(1), e(1), ed(1), egrep(1), ex(1), expr(1),
extract(1), grep(1), lex(1), man(1), nl(1), pg(1), regcmp(1), sed(1),
vi(1), regex(3), regcomp(3C), regcmp(3G), regexpr(3G), regex(5),
regexp(5).
Page 15 Reliant UNIX 5.44 Printed 11/98