regcmp(3X) regcmp(3X)
NAME
regcmp, regex - compile and execute a regular expression
SYNOPSIS
char *regcmp(string1 [, string2, ...], (char *)0))
char *string1, *string2, ...;
char *regex(re, subject[, ret0, ...])
char *re, *subject, *ret0, ...;
extern char *loc1;
DESCRIPTION
regcmp compiles a regular expression and returns a pointer
to the compiled form. malloc(3C) is used to create space
for the vector. It is the user's responsibility to free
unneeded space that has been allocated by malloc. A NULL
return from regcmp indicates an incorrect argument.
regcmp(1) has been written to generally preclude the need
for this routine at execution time.
regex executes a compiled pattern against the subject
string. Additional arguments are passed to receive values
back. regex returns NULL on failure or a pointer to the
next unmatched character on success. A global character
pointer loc1 points to where the match began. regcmp and
regex were mostly borrowed from the editor, ed(1); however,
the syntax and semantics have been changed slightly. The
following are the valid symbols and their associated
meanings.
[]*.^ These symbols retain their current meaning.
$ This symbol matches the end of the string; \n
matches the newline.
- Within brackets the minus means ``through''. For
example, [a-z] is equivalent to [abcd...xyz]. The
- can appear as itself only if used as the last or
first character. For example, the character class
expression []-] matches the characters ] and -.
+ A regular expression followed by + means ``one or
more times''. For example, [0-9]+ is equivalent
to [0-9][0-9]*.
{m} {m,} {m,u} Integer values enclosed in {} indicate the
number of times the preceding regular expression
is to be applied. The minimum number is m and the
maximum number is u, which must be less than 256.
If only m is present (e.g., {m}), it indicates the
exact number of times the regular expression is to
Page 1 (last mod. 1/14/87)
regcmp(3X) regcmp(3X)
be applied. {m,} is analogous to {m,infinity}.
The plus (+) and star (*) operations are
equivalent to {1,} and {0,}, respectively.
( ... )$n The value of the enclosed regular expression is to
be returned. The value will be stored in the
(n+1)th argument following the subject argument.
At present, at most 10 enclosed regular
expressions are allowed. regex makes its
assignments unconditionally.
( ... ) Parentheses are used for grouping. An operator
(e.g., *, +, {}) can work on a single character or
a regular expression enclosed in parentheses. For
example, (a*(cb+)*)$0.
By necessity, all the above defined symbols are special.
They must, therefore, be escaped to be used as themselves.
EXAMPLES
Example 1:
char *cursor, *newcursor, *ptr;
...
newcursor = regex((ptr = regcmp("^\n", 0)), cursor);
free(ptr);
This example will match a leading newline in the subject
string pointed at by cursor.
Example 2:
char ret0[9];
char *newcursor, *name;
...
name = regcmp("([A-Za-z][A-za-z0-9_]{0,7})$0", 0);
newcursor = regex(name, "123Testing321", ret0);
This example will match through the string ``Testing3'' and
will return the address of the character after the last
matched character (cursor+11). The string ``Testing3'' will
be copied to the character array ret0.
Example 3:
#include "file.i"
char *string, *newcursor;
...
newcursor = regex(name, string);
This example applies a precompiled regular expression in
file.i (see regcmp(1)) against string.
Page 2 (last mod. 1/14/87)
regcmp(3X) regcmp(3X)
This routine is kept in /lib/libPW.a.
SEE ALSO
ed(1), regcmp(1), malloc(3C).
BUGS
The user program may run out of memory if regcmp is called
iteratively without freeing the vectors no longer required.
The following user-supplied replacement for malloc(3C)
reuses the same vector, saving time and space:
/* user's program */
...
char *
malloc(n)
unsigned n;
{
static char rebuf[512];
return (n <= sizeof rebuf) ? rebuf : NULL;
}
Page 3 (last mod. 1/14/87)