regcomp(3C) regcomp(3C)
NAME
regcomp, regexec, regerror, regfree - regular expression matching
SYNOPSIS
#include <sys/types.h>
#include <regex.h>
int regcomp(regext *preg, const char *pattern, int cflags);
int regexec(const regext *preg, const char *string, sizet nmatch,
regmatcht pmatch[ ], int eflags);
sizet regerror(int errcode, const regext *preg, char *errbuf,
sizet errbufsize);
void regfree(regext *preg);
DESCRIPTION
These functions interpret basic and extended regular expressions as
described in expressions(5).
If the functions are not supported, errno is set to ENOSYS and the
return value REGENOSYS (from regcomp() and regexec()) or 0 (from
regerror()) is reported.
The structure type regext contains at least the following member:
______________________________________________________________________
| Type | Name | Description |
|___________|_____________|___________________________________________|
| sizet | rensub | Number of parenthesised subexpressions |
|___________|_____________|___________________________________________|
The structure type regmatch contains at least the following members:
______________________________________________________________________
| Type | Name | Description |
|____________|____________|___________________________________________|
| regofft | rmso | Byte offset from start of string to |
| | | start of substring |
|____________|____________|___________________________________________|
| regofft | rmeo | Byte offset from start of string to the |
| | | first character after the end of sub- |
| | | string |
|____________|____________|___________________________________________|
The regcomp() function will compile the regular expressions contained
in the string pointed to by the pattern argument and place the results
in the structure pointed to by preg. The cflags argument is the bit-
wise inclusive OR of zero or more of the following flags, which are
defined in the header regex.h:
REGEXTENDED Use extended regular expressions.
Page 1 Reliant UNIX 5.44 Printed 11/98
regcomp(3C) regcomp(3C)
REGICASE Ignore case in match.
REGNOSUB Report only success/fail in regexec().
REGNEWLINE Change the handling of newline characters, as
described in the text.
REGLITNL The \n string is treated as a newline character.
REGVIMODE << or >> are interpreted as the start or the fin-
ish of a word.
REGNSUBANCHOR The characters ^ and $ do not function as anchors
within subexpressions.
REGEMPTY regcomp() returns an error value if pattern is a
null string or empty.
The default regular expression type for pattern is a basic regular
expression. The application can specify extended regular expressions
using the REGEXTENDED cflags flag. The following must be remembered
when formulating the regular expressions
- Equivalence classes can be used in the range specifications.
- Range specifications linked together with the form [b-d-f] are not
possible. In this case, - and f are treated as normal characters.
- { in simple, and {{ in extended regular expressions are treated as
normal characters and do not initiate a repeat in this case.
- A repeat statement at the start of a regular expression is treated
as a normal character.
- ^ at the start or $ at the end of a subexpression works as an
anchor, unless the REGNSUBANCHOR flag is set.
On successful completion, it returns zero; otherwise it returns non-
zero, and the content of preg is undefined.
If the REGNOSUB flag was not set in cflags, then regcomp() will set
rensub to the number of parenthesised subexpressions (delimited by
\( \) in basic regular expressions or ( ) in extended regular expres-
sions) found in pattern.
The regexec() function compares the null-terminated string specified
by string with the compiled regular expression preg initialized by a
previous call to regcomp(). If it finds a match, regexec() returns
zero, otherwise it returns non-zero indicating either no match or an
error. The eflags argument is the bitwise inclusive OR of zero or more
of the following flags, which are defined in the header regex.h.
Page 2 Reliant UNIX 5.44 Printed 11/98
regcomp(3C) regcomp(3C)
REGNOTBOL The first character of the string pointed to by string
is not the beginning of the line. Therefore, the circum-
flex character (^), when taken as a special character,
will not match the beginning of string.
REGNOTEOL The last character of the string pointed to by string is
not the end of the line. Therefore, the dollar sign ($),
when taken as a special character, will not match the
end of string.
If nmatch is zero or REGNOSUB was set in the cflags argument to
regcomp(), then regexec() will ignore the pmatch argument. Otherwise,
the pmatch argument must point to an array with at least nmatch sub-
strings, and regexec() will fill in the elements of that array with
offsets of the substrings of string that correspond to the
parenthesised subexpressions of pattern: pmatch[i].rmso will be the
byte offset of the beginning and pmatch[i].rmeo will be one greater
than the byte offset of the end of the substring i. (Subexpression i
begins at the i-th matched open parenthesis, counting from 1.) Offsets
in pmatch[0] identify the substring that corresponds to the entire
regular expression. Unused elements of pmatch up to pmatch[nmatch1-1]
will be filled with -1. If there are more than nmatch subexpressions
in pattern (pattern itself counts as subexpression), then regexec()
will still do the match, but will record only the first nmatch sub-
strings.
When matching a basic or extended regular expression, any given
parenthesised subexpression of pattern might participate in the match
of several different substrings of string, or it might not match any
substring even though the pattern as a whole did match. The following
rules are used to determine which substrings to report in pmatch when
matching regular expressions:
1. If subexpression i in a regular expression is not contained within
another subexpression, and it participated in the match several
times, then the byte offsets in pmatch[i] will delimit the last
such match.
2. If subexpression i is not contained within another subexpression,
and it did not participate in an otherwise successful match, the
byte offsets in pnmatch[i] will be -1. A subexpression does not
participate in the match when:
- * or \{ \} appears immediately after the subexpression in a
basic regular expression
- *, ?, or { } appears immediately after the subexpression in an
extended regular expression, and the subexpression did not match
(matched zero times)
Page 3 Reliant UNIX 5.44 Printed 11/98
regcomp(3C) regcomp(3C)
- | is used in an extended regular expression to select this
subexpression or another, and the other subexpression matched.
3. If subexpression i is contained within another subexpression j, and
i is not contained within any other subexpression that is contained
within j, and a match of subexpression j is reported in pmatch[j],
then the match or non-match of subexpression i reported in
pmatch[i] will be as described in 1. and 2. above, but within the
substring reported in pmatch[j] rather than the whole string.
4. If subexpression i is contained in subexpression j, and the byte
offsets in pmatch[j] are -1, then the pointers in pmatch[i] also
will be -1.
5. If subexpression i matched a zero-length string, then both byte
offsets in pmatch[i] will be the byte offset of the character or
null terminator immediately following the zero-length string.
If, when regexec() is called, the locale is different from when the
regular expression was compiled, the result is undefined.
If REGNEWLINE is not set in cflags, then a newline character in
pattern or string will be treated as an ordinary character. If
REGNEWLINE is set, then newline will be treated as an ordinary char-
acter except as follows:
1. A newline character in string will not be matched by a period out-
side a bracket expression or by any form of a non-matching list.
2. A circumflex (^) in pattern, when used to specify expression
anchoring will match the zero-length string immediately after a
newline in string, regardless of the setting of REGNOTBOL.
3. A dollar-sign ($) in pattern, when used to specify expression
anchoring, will match the zero-length string immediately before a
newline in string, regardless of the setting of REGNOTEOL.
The regfree() function frees any memory allocated by regcomp() associ-
ated with preg.
The following constants are defined as error return values:
REGNOMATCH regexec() failed to match.
REGINVARG An invalid parameter specified.
REGBADPAT Invalid regular expression.
REGECOLLATE Invalid collating element referenced.
REGECTYPE Invalid character class type referenced.
Page 4 Reliant UNIX 5.44 Printed 11/98
regcomp(3C) regcomp(3C)
REGEESCAPE Trailing \ in pattern.
REGESUBREG Number in \digit invalid or in error.
REGEBRACK [ ] imbalance.
REGENOSYS The function is not supported.
REGEPAREN \ ( \) or ( ) imbalance
REGEBRACE \{ \} imbalance
REGBADBR Content of \{ \} invalid: not a number, number too
large, more than two numbers, first larger than
second.
REGERANGE Invalid endpoint in range expression.
REGESPACE Out of memory.
REGBADRPT ?, * or + not preceded by valid regular expres-
sion.
REGEPATTERN A null or empty pattern was specified as a parame-
ter and regcomp() was called with the REGEMPTY
flag.
The regerror() function provides a mapping from error codes returned
by regcomp() and regexec() to unspecified printable strings. It gen-
erates a string corresponding to the value of the errcode argument,
which must be the last non-zero value returned by regcomp() or
regexec() with the given value of preg. If errcode is not such a
value, the content of the generated string is unspecified.
If preg is a null pointer, but errcode is a value returned by a previ-
ous call to regexec() or regcomp(), the regerror() still generates an
error string corresponding to the value of errcode, but might not be
as detailed under some implementations.
If the errbufsize argument is not zero, regerror() will place the
generated string into the buffer of size errbufsize pointed to by
errbuf. If the string (including the terminating null) cannot fit in
the buffer, regerror() will truncate the string and null-terminate the
result.
If errbufsize is zero, regerror() ignores the errbuf argument, and
returns the size of the buffer needed to hold the generated string.
If the preg argument to regexec() or regcomp() is not a compiled regu-
lar expression returned by regcomp(), the result is undefined. A preg
is no longer treated as a compiled regular expression after it is
given to regfree().
Page 5 Reliant UNIX 5.44 Printed 11/98
regcomp(3C) regcomp(3C)
An application could use:
regerror(code, preg, (char *)NULL, (sizet) 0)
to find out how big a buffer is needed for the generated string,
malloc() a buffer to hold the string, and then call regerror() again
to get the string. Alternately, it could allocate a fixed, static
buffer that is big enough to hold most strings, and then use malloc()
to allocate a larger buffer if it finds that this is too small. Use
the fnmatch() function for matching a pattern with the pattern match-
ing notation.
RESULT
On successful completion, the regcomp() function returns zero. Other-
wise it returns an integer value indicating an error as described in
<regex.h>, and the content of preg is undefined.
On successful completion, the regexec() function returns zero. Other-
wise it returns REGNOMATCH to indicate no match, or REGENOSYS to
indicate that the function is not supported.
Upon successful completion, the regerror() function returns the number
of bytes needed to hold the entire generated string. Otherwise, it
returns zero to indicate that the function is not implemented.
The regfree() function returns no value.
EXAMPLES
#include <regex.h>
/* Match string against the extended regular expression in
* pattern, treating errors as no match.
* return 1 for match, 0 for no match. */
int match (const char *string, char *pattern)
{
int status;
regext re;
if (regcomp (&re, pattern, REGEXTENDED|REGNOSUB) !=0)
{
return (0); /* report error */
}
status = regexec (&re, string, (sizet) 0, NULL, 0);
regfree (&re);
if (status != 0)
{
return (0); /* report error */
}
return (1);
}
Page 6 Reliant UNIX 5.44 Printed 11/98
regcomp(3C) regcomp(3C)
The following demonstrates how the REGNOTBOL flag could be used with
regexec() to find all substrings in a line that match a pattern sup-
plied by a user. (For simplicity of the example, very little error
checking is done.)
(void) regcomp (&re, pattern, 0);
/* this call to regexec() finds the first match on the line */
error = regexec (&re, &buffer[0], 1, pm, 0);
while (error == 0)
{ /* while matches found */
/* substring found between pm.rmso and pm.rmeo */
/* This call to regexec() finds the next match */
error = regexec (&re, buffer + pm.rm:eo, 1, &pm, REGNOTBOL);
}
NOTES
If you use one of these functions, you must link the libgen library at
compilation (cc -lgen).
SEE ALSO
regcmp(1), regex(3), fnmatch(3C), glob(3C), regcmp(3G), regexpr(3G),
expressions(5), regex(5), regexp(5).
Page 7 Reliant UNIX 5.44 Printed 11/98