regcmp(3x)
_________________________________________________________________
regcmp, regex Subroutine
compile and execute regular expression
_________________________________________________________________
SYNTAX
char *regcmp (string1 [, string2, ...], (char *)0)
char *string1, *string2, ...;
char *regex (re, subject[, ret0, ...])
char *re, *subject, *ret0, ...;
extern char *loc1;
DESCRIPTION
Regcmp compiles a regular expression and returns a pointer to the
compiled form. Malloc(3C) is used to create space for the
vector. You must free unneeded space so allocated. A NULL
return from regcmp indicates an incorrect argument. Regcmp(1)
has been written to generally preclude the need for this routine
at execution time.
Regex executes a compiled pattern against the subject string.
Additional arguments are passed to receive values back. Regex
returns NULL on failure or a pointer to the next unmatched
character on success. A global character pointer __loc1 points
to where the match began. Regcmp and regex were mostly borrowed
from the editor, ed(1); however, the syntax and semantics have
been changed slightly. The following are the valid symbols and
their associated meanings.
[]*.^ These symbols retain their current meaning.
$ Matches the end of the string; \n matches a new-line.
- Within brackets the minus means through. For example,
[a-z] is equivalent to [abcd...xyz]. The - can appear
as itself only if used as the first or last character.
For example, the character class expression []-]
matches the characters ] and -.
+ A regular expression followed by + means one or more
times. For example, [0-9]+ is equivalent to
[0-9][0-9]*.
{m} {m,} {m,u}
Integer values enclosed in {} indicate the number of
DG/UX 4.00 Page 1
Licensed material--property of copyright holder(s)
regcmp(3x)
times the preceding regular expression is to be
applied. The value m is the minimum number and u is a
number, less than 256, which is the maximum. If only m
is present (e.g., {m}), it indicates the exact number
of times the regular expression is to be applied. The
value {m,} is analogous to {m,infinity}. The plus (+)
and star (*) operations are equivalent to {1,} and {0,}
respectively.
( ... )$n The value of the enclosed regular expression is
returned. The value will be stored in the (n+1)th
argument following the subject argument. At most ten
enclosed regular expressions are allowed. Regex makes
its assignments unconditionally.
( ... ) Parentheses are used for grouping. An operator, e.g.,
*, +, {}, can work on a single character or a regular
expression enclosed in parentheses. For example,
(a*(cb+)*)$0.
All of these symbols are special. They must, therefore, be
escaped to be used as themselves.
EXAMPLES
Example 1:
char *cursor, *newcursor, *ptr;
...
newcursor = regex((ptr = regcmp("^\n", 0)), cursor);
free(ptr);
This example will match a leading new-line in the subject string
that the cursor points to.
Example 2:
char ret0[9];
char *newcursor, *name;
...
name = regcmp("([A-Za-z][A-za-z0-9_]{0,7})$0", 0);
newcursor = regex(name, "123Testing321", ret0);
This example matches through the string Testing3 and returns the
address of the character after the last matched character
(cursor+11). The string Testing3 is copied to the character
array ret0.
Example 3:
#include "file.i"
char *string, *newcursor;
...
DG/UX 4.00 Page 2
Licensed material--property of copyright holder(s)
regcmp(3x)
newcursor = regex(name, string);
This example applies a precompiled regular expression in file.i
(see regcmp(1)) against string.
This routine is kept in /lib/libPW.a.
SEE ALSO
malloc(3C).
ed(1), regcmp(1) in the User's Reference for the DG/UX System
WARNING
The user program may run out of memory if regcmp is called
iteratively without freeing the vectors no longer required.
DG/UX 4.00 Page 3
Licensed material--property of copyright holder(s)