regcmp(3X) regcmp(3X)
NAME
regcmp, regex - compile and execute regular expression
SYNOPSIS
#include <libgen.h>
cc [flag ...] file ... -lgen [library ...]
char *regcmp (const char *string1 [, char *string2, ...], (char *)0);
char *regex (const char *re, const char *subject [, char *ret0, ...]);
extern char *loc1;
DESCRIPTION
regcmp compiles a regular expression (consisting of the con-
catenated arguments) and returns a pointer to the compiled
form. malloc(3C) is used to create space for the compiled
form. It is the user's responsibility to free unneeded
space so allocated. A NULL return from regcmp indicates an
incorrect argument. regcmp(1) has been written to generally
preclude the need for this routine at execution time.
regex executes a compiled pattern against the subject
string. Additional arguments are passed to receive values
back. regex returns NULL on failure or a pointer to the
next unmatched character on success. A global character
pointer loc1 points to where the match began. regcmp and
regex were mostly borrowed from the editor, ed(1); however,
the syntax and semantics have been changed slightly. The
following are the valid symbols and their associated mean-
ings.
[]*.^ These symbols retain their meaning in ed(1).
$ Matches the end of the string; \n matches a new-
line.
- Within brackets the minus means through. For
example, [a-z] is equivalent to [abcd...xyz]. The
- can appear as itself only if used as the first
or last character. For example, the character
class expression []-] matches the characters ] and
-.
+ A regular expression followed by + means one or
more times. For example, [0-9]+ is equivalent to
[0-9][0-9]*.
{m} {m,} {m,u}
Integer values enclosed in {} indicate the number
of times the preceding regular expression is to be
Page 1 CX/UX Programmer's Reference Manual
regcmp(3X) regcmp(3X)
applied. The value m is the minimum number and u
is a number, less than 256, which is the maximum.
If only m is present (i.e., {m}), it indicates the
exact number of times the regular expression is to
be applied. The value {m,} is analogous to
{m,infinity}. The plus (+) and star (*) opera-
tions are equivalent to {1,} and {0,} respec-
tively.
( ... )$n The value of the enclosed regular expression is to
be returned. The value will be stored in the
(n+1)th argument following the subject argument.
At most, ten enclosed regular expressions are
allowed. regex makes its assignments uncondition-
ally.
( ... ) Parentheses are used for grouping. An operator,
e.g., *, +, {}, can work on a single character or
a regular expression enclosed in parentheses. For
example, (a*(cb+)*)$0.
By necessity, all the above defined symbols are special.
They must, therefore, be escaped with a \ (backslash) to be
used as themselves.
EXAMPLES
The following example matches a leading newline in the sub-
ject string pointed at by cursor.
char *cursor, *newcursor, *ptr;
...
newcursor = regex((ptr = regcmp("^\n", (char *)0)), cursor);
free(ptr);
The following example matches through the string Testing3
and returns the address of the character after the last
matched character (the ``4''). The string Testing3 is
copied to the character array ret0.
char ret0[9];
char *newcursor, *name;
...
name = regcmp("([A-Za-z][A-za-z0-9]{0,7})$0", (char *)0);
newcursor = regex(name, "012Testing345", ret0);
The following example applies a precompiled regular expres-
sion in file.i [see regcmp(1)] against string.
#include "file.i"
char *string, *newcursor;
...
newcursor = regex(name, string);
Page 2 CX/UX Programmer's Reference Manual
regcmp(3X) regcmp(3X)
SEE ALSO
regcmp(1), malloc(3C).
ed(1) in the CX/UX User's Reference Manual.
NOTES
The user program may run out of memory if regcmp is called
iteratively without freeing the vectors no longer required.
Page 3 CX/UX Programmer's Reference Manual