Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ regcomp(3C) — Reliant UNIX 5.44c4

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

regcmp(1)

regex(3)

fnmatch(3C)

glob(3C)

regcmp(3G)

regexpr(3G)

expressions(5)

regex(5)

regexp(5)

regcomp(3C)                                                     regcomp(3C)

NAME
     regcomp, regexec, regerror, regfree - regular expression matching

SYNOPSIS
     #include <sys/types.h>
     #include <regex.h>

     int regcomp(regext *preg, const char *pattern, int cflags);

     int regexec(const regext *preg, const char *string, sizet nmatch,
                 regmatcht pmatch[ ], int eflags);

     sizet regerror(int errcode, const regext *preg, char *errbuf,
                     sizet errbufsize);

     void regfree(regext *preg);

DESCRIPTION
     These functions interpret basic and extended regular expressions as
     described in expressions(5).

     If the functions are not supported, errno is set to ENOSYS and the
     return value REGENOSYS (from regcomp() and regexec()) or 0 (from
     regerror()) is reported.

     The structure type regext contains at least the following member:
     ______________________________________________________________________
    |  Type     |   Name      |   Description                             |
    |___________|_____________|___________________________________________|
    |  sizet   |   rensub   |   Number of parenthesised subexpressions  |
    |___________|_____________|___________________________________________|

     The structure type regmatch contains at least the following members:
     ______________________________________________________________________
    |  Type      |  Name      |   Description                             |
    |____________|____________|___________________________________________|
    |  regofft  |  rmso     |   Byte offset from start of string to     |
    |            |            |   start of substring                      |
    |____________|____________|___________________________________________|
    |  regofft  |  rmeo     |   Byte offset from start of string to the |
    |            |            |   first character after the end of sub-   |
    |            |            |   string                                  |
    |____________|____________|___________________________________________|

     The regcomp() function will compile the regular expressions contained
     in the string pointed to by the pattern argument and place the results
     in the structure pointed to by preg. The cflags argument is the bit-
     wise inclusive OR of zero or more of the following flags, which are
     defined in the header regex.h:

     REGEXTENDED        Use extended regular expressions.




Page 1                       Reliant UNIX 5.44                Printed 11/98

regcomp(3C)                                                     regcomp(3C)

     REGICASE           Ignore case in match.

     REGNOSUB           Report only success/fail in regexec().

     REGNEWLINE         Change the handling of newline characters, as
                         described in the text.

     REGLITNL          The \n string is treated as a newline character.

     REGVIMODE          << or >> are interpreted as the start or the fin-
                         ish of a word.

     REGNSUBANCHOR     The characters ^ and $ do not function as anchors
                         within subexpressions.

     REGEMPTY           regcomp() returns an error value if pattern is a
                         null string or empty.

     The default regular expression type for pattern is a basic regular
     expression. The application can specify extended regular expressions
     using the REGEXTENDED cflags flag. The following must be remembered
     when formulating the regular expressions

     -  Equivalence classes can be used in the range specifications.

     -  Range specifications linked together with the form [b-d-f] are not
        possible. In this case, - and f are treated as normal characters.

     -  { in simple, and {{ in extended regular expressions are treated as
        normal characters and do not initiate a repeat in this case.

     -  A repeat statement at the start of a regular expression is treated
        as a normal character.

     -  ^ at the start or $ at the end of a subexpression works as an
        anchor, unless the REGNSUBANCHOR flag is set.

     On successful completion, it returns zero; otherwise it returns non-
     zero, and the content of preg is undefined.

     If the REGNOSUB flag was not set in cflags, then regcomp() will set
     rensub to the number of parenthesised subexpressions (delimited by
     \( \) in basic regular expressions or ( ) in extended regular expres-
     sions) found in pattern.

     The regexec() function compares the null-terminated string specified
     by string with the compiled regular expression preg initialized by a
     previous call to regcomp(). If it finds a match, regexec() returns
     zero, otherwise it returns non-zero indicating either no match or an
     error. The eflags argument is the bitwise inclusive OR of zero or more
     of the following flags, which are defined in the header regex.h.



Page 2                       Reliant UNIX 5.44                Printed 11/98

regcomp(3C)                                                     regcomp(3C)

     REGNOTBOL    The first character of the string pointed to by string
                   is not the beginning of the line. Therefore, the circum-
                   flex character (^), when taken as a special character,
                   will not match the beginning of string.

     REGNOTEOL    The last character of the string pointed to by string is
                   not the end of the line. Therefore, the dollar sign ($),
                   when taken as a special character, will not match the
                   end of string.

     If nmatch is zero or REGNOSUB was set in the cflags argument to
     regcomp(), then regexec() will ignore the pmatch argument. Otherwise,
     the pmatch argument must point to an array with at least nmatch sub-
     strings, and regexec() will fill in the elements of that array with
     offsets of the substrings of string that correspond to the
     parenthesised subexpressions of pattern: pmatch[i].rmso will be the
     byte offset of the beginning and pmatch[i].rmeo will be one greater
     than the byte offset of the end of the substring i. (Subexpression i
     begins at the i-th matched open parenthesis, counting from 1.) Offsets
     in pmatch[0] identify the substring that corresponds to the entire
     regular expression. Unused elements of pmatch up to pmatch[nmatch1-1]
     will be filled with -1. If there are more than nmatch subexpressions
     in pattern (pattern itself counts as subexpression), then regexec()
     will still do the match, but will record only the first nmatch sub-
     strings.

     When matching a basic or extended regular expression, any given
     parenthesised subexpression of pattern might participate in the match
     of several different substrings of string, or it might not match any
     substring even though the pattern as a whole did match. The following
     rules are used to determine which substrings to report in pmatch when
     matching regular expressions:

     1. If subexpression i in a regular expression is not contained within
        another subexpression, and it participated in the match several
        times, then the byte offsets in pmatch[i] will delimit the last
        such match.

     2. If subexpression i is not contained within another subexpression,
        and it did not participate in an otherwise successful match, the
        byte offsets in pnmatch[i] will be -1. A subexpression does not
        participate in the match when:

        -  * or \{ \} appears immediately after the subexpression in a
           basic regular expression

        -  *, ?, or { } appears immediately after the subexpression in an
           extended regular expression, and the subexpression did not match
           (matched zero times)





Page 3                       Reliant UNIX 5.44                Printed 11/98

regcomp(3C)                                                     regcomp(3C)

        -  | is used in an extended regular expression to select this
           subexpression or another, and the other subexpression matched.

     3. If subexpression i is contained within another subexpression j, and
        i is not contained within any other subexpression that is contained
        within j, and a match of subexpression j is reported in pmatch[j],
        then the match or non-match of subexpression i reported in
        pmatch[i] will be as described in 1. and 2. above, but within the
        substring reported in pmatch[j] rather than the whole string.

     4. If subexpression i is contained in subexpression j, and the byte
        offsets in pmatch[j] are -1, then the pointers in pmatch[i] also
        will be -1.

     5. If subexpression i matched a zero-length string, then both byte
        offsets in pmatch[i] will be the byte offset of the character or
        null terminator immediately following the zero-length string.

     If, when regexec() is called, the locale is different from when the
     regular expression was compiled, the result is undefined.

     If REGNEWLINE is not set in cflags, then a newline character in
     pattern or string will be treated as an ordinary character. If
     REGNEWLINE is set, then newline will be treated as an ordinary char-
     acter except as follows:

     1. A newline character in string will not be matched by a period out-
        side a bracket expression or by any form of a non-matching list.

     2. A circumflex (^) in pattern, when used to specify expression
        anchoring will match the zero-length string immediately after a
        newline in string, regardless of the setting of REGNOTBOL.

     3. A dollar-sign ($) in pattern, when used to specify expression
        anchoring, will match the zero-length string immediately before a
        newline in string, regardless of the setting of REGNOTEOL.

     The regfree() function frees any memory allocated by regcomp() associ-
     ated with preg.

     The following constants are defined as error return values:

     REGNOMATCH         regexec() failed to match.

     REGINVARG          An invalid parameter specified.

     REGBADPAT          Invalid regular expression.

     REGECOLLATE        Invalid collating element referenced.

     REGECTYPE          Invalid character class type referenced.



Page 4                       Reliant UNIX 5.44                Printed 11/98

regcomp(3C)                                                     regcomp(3C)

     REGEESCAPE         Trailing \ in pattern.

     REGESUBREG         Number in \digit invalid or in error.

     REGEBRACK          [ ] imbalance.

     REGENOSYS          The function is not supported.

     REGEPAREN          \ ( \) or ( ) imbalance

     REGEBRACE          \{ \} imbalance

     REGBADBR           Content of \{ \} invalid: not a number, number too
                         large, more than two numbers, first larger than
                         second.

     REGERANGE          Invalid endpoint in range expression.

     REGESPACE          Out of memory.

     REGBADRPT          ?, * or + not preceded by valid regular expres-
                         sion.

     REGEPATTERN        A null or empty pattern was specified as a parame-
                         ter and regcomp() was called with the REGEMPTY
                         flag.

     The regerror() function provides a mapping from error codes returned
     by regcomp() and regexec() to unspecified printable strings. It gen-
     erates a string corresponding to the value of the errcode argument,
     which must be the last non-zero value returned by regcomp() or
     regexec() with the given value of preg. If errcode is not such a
     value, the content of the generated string is unspecified.

     If preg is a null pointer, but errcode is a value returned by a previ-
     ous call to regexec() or regcomp(), the regerror() still generates an
     error string corresponding to the value of errcode, but might not be
     as detailed under some implementations.

     If the errbufsize argument is not zero, regerror() will place the
     generated string into the buffer of size errbufsize pointed to by
     errbuf. If the string (including the terminating null) cannot fit in
     the buffer, regerror() will truncate the string and null-terminate the
     result.

     If errbufsize is zero, regerror() ignores the errbuf argument, and
     returns the size of the buffer needed to hold the generated string.

     If the preg argument to regexec() or regcomp() is not a compiled regu-
     lar expression returned by regcomp(), the result is undefined. A preg
     is no longer treated as a compiled regular expression after it is
     given to regfree().


Page 5                       Reliant UNIX 5.44                Printed 11/98

regcomp(3C)                                                     regcomp(3C)

     An application could use:

     regerror(code, preg, (char *)NULL, (sizet) 0)

     to find out how big a buffer is needed for the generated string,
     malloc() a buffer to hold the string, and then call regerror() again
     to get the string. Alternately, it could allocate a fixed, static
     buffer that is big enough to hold most strings, and then use malloc()
     to allocate a larger buffer if it finds that this is too small. Use
     the fnmatch() function for matching a pattern with the pattern match-
     ing notation.

RESULT
     On successful completion, the regcomp() function returns zero. Other-
     wise it returns an integer value indicating an error as described in
     <regex.h>, and the content of preg is undefined.

     On successful completion, the regexec() function returns zero. Other-
     wise it returns REGNOMATCH to indicate no match, or REGENOSYS to
     indicate that the function is not supported.

     Upon successful completion, the regerror() function returns the number
     of bytes needed to hold the entire generated string. Otherwise, it
     returns zero to indicate that the function is not implemented.

     The regfree() function returns no value.

EXAMPLES
     #include <regex.h>
     /* Match string against the extended regular expression in
      * pattern, treating errors as no match.
      * return 1 for match, 0 for no match. */

     int match (const char *string, char *pattern)
     {
       int status;
       regext re;
       if (regcomp (&re, pattern, REGEXTENDED|REGNOSUB) !=0)
       {
         return (0);               /* report error */
       }
       status = regexec (&re, string, (sizet) 0, NULL, 0);
     regfree (&re);
       if (status != 0)
       {
        return (0);                /* report error */
       }
       return (1);
     }





Page 6                       Reliant UNIX 5.44                Printed 11/98

regcomp(3C)                                                     regcomp(3C)

     The following demonstrates how the REGNOTBOL flag could be used with
     regexec() to find all substrings in a line that match a pattern sup-
     plied by a user. (For simplicity of the example, very little error
     checking is done.)

     (void) regcomp (&re, pattern, 0);
     /* this call to regexec() finds the first match on the line */
     error = regexec (&re, &buffer[0], 1, pm, 0);
     while (error == 0)
     {                /* while matches found */
                      /* substring found between pm.rmso and pm.rmeo */
                      /* This call to regexec() finds the next match  */
        error = regexec (&re, buffer + pm.rm:eo, 1, &pm, REGNOTBOL);
     }

NOTES
     If you use one of these functions, you must link the libgen library at
     compilation (cc -lgen).

SEE ALSO
     regcmp(1), regex(3), fnmatch(3C), glob(3C), regcmp(3G), regexpr(3G),
     expressions(5), regex(5), regexp(5).
































Page 7                       Reliant UNIX 5.44                Printed 11/98

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026