regcmp(3X) regcmp(3X)NAME regcmp, regex - compile and execute a regular expression SYNOPSIS char *regcmp(string1 [, string2, ...], (char *)0)) char *string1, *string2, ...; char *regex(re, subject [, ret0, ...]) char *re, *subject, *ret0, ...; extern char *loc1; DESCRIPTION regcmp compiles a regular expression and returns a pointer to the compiled form. malloc(3C) is used to create space for the vector. It is the user's responsibility to free un- needed space that has been allocated by malloc. A NULL re- turn from regcmp indicates an incorrect argument. regcmp(1) has been written to generally preclude the need for this routine at execution time. regex executes a compiled pattern against the subject string. Additional arguments are passed to receive values back. regex returns NULL on failure or a pointer to the next unmatched character on success. A global character pointer loc1 points to where the match began. regcmp and regex were mostly borrowed from the editor, ed(1); however, the syntax and semantics have been changed slightly. The following are the valid symbols and their associated mean- ings. []*.^ These symbols retain their current meaning. $ This symbol matches the end of the string; \n matches the newline. - Within brackets the minus means ``through.'' For example, [a-z] is equivalent to [abcd...xyz]. The - can appear as itself only if used as the last or first character. For example, the character class expression []-] matches the characters ] and -. + A regular expression followed by + means ``one or more times.'' For example, [0-9]+ is equivalent to [0-9][0-9]*. {m} {m,} {m,u} Integer values enclosed in {} indicate the number of times the preceding regular expression is to be applied. The minimum number is m and the maximum number is u, which must be less than 256. If only m is present (e.g., {m}), it indicates the exact number of times the regular expression is to April, 1990 1
regcmp(3X) regcmp(3X)be applied. {m,} is analogous to {m,infinity}. The plus (+) and star (*) operations are equivalent to {1,} and {0,}, respectively. ( ... )$n The value of the enclosed regular expression is to be returned. The value will be stored in the (n+1)th argument following the subject argument. At present, at most 10 enclosed regular expres- sions are allowed. regex makes its assignments unconditionally. ( ... ) Parentheses are used for grouping. An operator (e.g., *, +, {}) can work on a single character or a regular expression enclosed in parentheses. For example, (a*(cb+)*)$0. By necessity, all the above defined symbols are special. They must, therefore, be escaped to be used as themselves. EXAMPLES Example 1: char *cursor, *newcursor, *ptr; ... newcursor = regex((ptr = regcmp("^\n", 0)), cursor); free(ptr); This example will match a leading newline in the subject string pointed at by cursor. Example 2: char ret0[9]; char *newcursor, *name; ... name = regcmp("([A-Za-z][A-za-z0-9_]{0,7})$0", 0); newcursor = regex(name, "123Testing321", ret0); This example will match through the string ``Testing3'' and will return the address of the character after the last matched character (cursor+11). The string ``Testing3'' will be copied to the character array ret0. Example 3: #include "file.i" char *string, *newcursor; ... newcursor = regex(name, string); This example applies a precompiled regular expression in file.i (see regcmp(1)) against string. 2 April, 1990
regcmp(3X) regcmp(3X)This routine is kept in /lib/libPW.a. SEE ALSO ed(1), regcmp(1), malloc(3C). BUGS The user program may run out of memory if regcmp is called iteratively without freeing the vectors no longer required. The following user-supplied replacement for malloc(3C) reuses the same vector, saving time and space: /* user's program */ char * malloc(n) unsigned n; { static char rebuf[512]; return (n <= sizeof rebuf) ? rebuf : NULL; } April, 1990 3