REGCMP(3x,L) AIX Technical Reference REGCMP(3x,L)
-------------------------------------------------------------------------------
regcmp, regex
PURPOSE
Compiles and matches regular-expression patterns.
LIBRARY
Programmers Workbench Library (libPW.a)
SYNTAX
char *regcmp (str [, str,...charc*regex (pat, subject [, ret,...])
char *str, *str,...; char *pat, *subject, *ret,...;
extern char *__loc1;
DESCRIPTION
The regcmp subroutine compiles a regular expression (or pattern) and returns a
pointer to the compiled form. The str parameters specify the pattern to be
compiled. If more than one str parameter is given, then regcmp treats them as
if they were concatenated together. It returns a NULL pointer if it encounters
an incorrect parameter.
You can use the regcmp command to compile regular expressions into your C
program, frequently eliminating the need to call the regcmp subroutine at run
time.
The regex subroutine compares a compiled pattern to the subject string.
Additional parameters are used to receive values. Upon successful completion,
the regex subroutine returns a pointer to the next unmatched character. If the
regex subroutine fails, a NULL pointer is returned. A global character
pointer, __loc1, points to where the match began.
The regcmp and regex subroutines are borrowed from the ed command; however, the
syntax and semantics have been changed slightly. You can use the following
symbols with the regcmp and regex subroutines:
"[ ] * . ^"
These symbols have the same meaning as they do in the ed command.
"-"
For regex, the minus within brackets means "through" according to the
current collating sequence. For example, depending on the default collating
Processed November 7, 1990 REGCMP(3x,L) 1
REGCMP(3x,L) AIX Technical Reference REGCMP(3x,L)
sequence, "[a-z]" can be equivalent to "[abcd"..."xyz]" or
"[aBbCc"..."xYyZz]". You can use the "-" by itself if the "-" is the last
or first character. For example, the character class expression "[]-]"
matches the "]" (right bracket) and "-" (minus) characters.
"$"
Matches the end of the string. Use "\n" to match a new-line character.
"+"
A regular expression followed by "+" means one or more times. For example,
"[0-9]+" is equivalent to "[0-9][0-9]*".
"{"m"}" "{"m,"}" "{"m,u"}"
Integer values enclosed in "{" "}" indicate the number of times to apply the
preceding regular expression. m is the minimum number and u is the maximum
number. u must be less than 256. If you specify only m, it indicates the
exact number of times to apply the regular expression. "{"m,"}" is
equivalent to "{"m,infinity"}" and matches m or more occurrences of the
expression. The plus "+" (plus) and "*" (asterisk) operations are
equivalent to "{1,}" and "{0,}", respectively.
"("...")$"n
This stores the value matched by the enclosed regular expression in the
(n+1) (th) ret parameter. Ten enclosed regular expressions are allowed.
regex makes the assignments unconditionally.
"("...")"
Parentheses group subexpressions. An operator, such as "*", "+", or "{" "}"
works on a single character or on a regular expression enclosed in
parenthesis. For example, "(a*(cb+)*)$0".
All of the above defined symbols are special. You must precede them with a "\"
(backslash) if you want to match the special symbol itself. For example, "\$"
matches a dollar sign.
The following special symbols are defined for internationalized regular
expressions. Each is valid only within a range expression, (that is, between
brackets).
"[:alnum:]"
Matches any alphanumeric, as defined by the NLctype.h macro "iswalnum".
"[:alpha:]"
Matches any alpha, like "iswalpha".
"[:digit:]"
Matches any digit, like "iswdigit".
"[:lower:]"
Matches any lower, like "iswlower".
Processed November 7, 1990 REGCMP(3x,L) 2
REGCMP(3x,L) AIX Technical Reference REGCMP(3x,L)
"[:print:]"
Matches any printable, like "iswprint".
"[:punct:]"
Matches any punctuation, like "iswpunct".
"[:space:]"
Matches any white space, like "iswspace".
"[:upper:]"
Matches any upper case letter, like "iswupper".
"[:xdigit:]"
Matches any hex digit, like "iswxdigit".
"[=X=]"
matches any character in the same equivalence class as "X", as defined by
"wceqvmap".
"[.XY.]"
Matches the multiple character collating sequence XY as a single character
(as defined by "_wcxcol". For example, some Latin languages collate the
sequence "ch" as a single character which falls between the letters c and
d. The regular expression "[c[.ch.]d]amp" would match the words "camp",
"champ", and "damp".
The ctype sequences, such as "[:alpha:]", cannot be used as end points of a
range.
Note: regcmp uses the malloc subroutine to make the space for the vector.
Always free the vectors that are not required. If you do not free the
unrequired vectors, you may run out of memory if regcmp is called
repeatedly. Use the following as a replacement for malloc to reuse the
same vector, thus saving time and space:
/* ...Your Program... */
malloc(n)
int n;
{
static int rebuf[256];
return ((n <= sizeof(rebuf)) ? rebuf : NULL);
}
EXAMPLES
1. To perform a simple match:
Processed November 7, 1990 REGCMP(3x,L) 3
REGCMP(3x,L) AIX Technical Reference REGCMP(3x,L)
char *cursor, *newcursor, *ptr;
...
newcursor = regex((ptr = regcmp("^\n", 0)), cursor);
free(ptr);
This matches a leading new-line character in the subject string pointed to
by "cursor".
2. To extract a substring that matches a pattern:
char ret0[9];
char *newcursor, *name;
...
name = regcmp("([A-Za-z][A-Za-z0-9]{0,7})$0", 0);
newcursor = regex(name, "123Testing321", ret0);
This matches the eight-character identifier "Testing3" and returns the
address of the character after the last matched character (which is stored
in "newcursor"). The string "Testing3" is copied into the character array
"ret0".
RELATED INFORMATION
In this book: "malloc, free, realloc, calloc, valloc, alloca, mallopt,
mallinfo," "NCcollate, NCcoluniq, NCeqvmap, _NCxcol, _NLxcol," "wc_collate,
wc_coluniq, wc_eqvmap, _wcxcol, _mbxcol, _wcxcolu, _mbxcolu," "setlocale," and
"regexp: compile, step, advance."
The ed and regcmp commands in AIX Operating System Commands Reference.
"Introduction to International Character Support" in Managing the AIX Operating
System.
AIX Guide to Multibyte Character Set (MBCS) Support.
Processed November 7, 1990 REGCMP(3x,L) 4