Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ regcmp(S) — OpenDesktop Software Development System 3.0.0

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

ed(C)

free(S)

malloc(S)


 regcmp(S)                      6 January 1993                      regcmp(S)


 Name

    regcmp, regex - compiles and executes regular expressions

 Syntax


    cc  . . .  -lintl


    char *regcmp (string1 [, string2,  . . .  (char *)0)
    char *string1, *string2,  . . .

    char *regex (re, subject, [ ret0,  . . .
    char *re, *subject, *ret0,  . . .

    extern char *__loc1;


 Description

    The regcmp routine compiles a regular expression (consisting of the con-
    catenated arguments) and returns a pointer to the compiled form.  The
    malloc(S) routine creates space for the compiled form.  It is the user's
    responsibility to free unneeded space so allocated.  A NULL return from
    regcmp indicates an incorrect argument.  regcmp(CP) has been written to
    generally preclude the need for this routine at execution time.

    The regex routine executes a compiled pattern against the subject string.
    Additional arguments are passed to receive values back.  regex returns
    NULL on failure or a pointer to the next unmatched character on success.
    A global character pointer loc1 points to where the match began.

    The regex and regcmp routines were borrowed from the editor, ed(C); how-
    ever, the syntax and semantics have been changed slightly.

    The following are the symbols understood by regex and regcmp, and their
    meanings.

    []*.^     These symbols retain their meaning in ed(C).

    $         Matches the end of the string; \n matches a new-line.

    -         Within brackets the minus means through.  For example, [a-z] is
              equivalent to [abcd...xyz].  The ``-'' can appear as itself
              only if used as the first or last character.  For example, the
              character class expression []-] matches the characters ] and -.

    +         A regular expression followed by ``+'' means one or more times.
              For example, [0-9]+ is equivalent to [0-9] [0-9]*.

    {m} {m,} {m,u}
              Integer values enclosed in ``{}'' indicate the number of times
              the preceding regular expression is to be applied.  The value m
              is the minimum number and u is a number, less than 256, which
              is the maximum.  If only m is present (for example, {m}), it
              indicates the exact number of times the regular expression is
              to be applied.  The value {m,} is analogous to {m,infinity}.
              The plus (``+'') and star (``*'') operations are equivalent to
              {1,} and {0,} respectively.

    ( ... )$n The value of the enclosed regular expression is to be returned.
              The value is stored in the (n+1)th argument following the sub-
              ject argument.  At most ten enclosed regular expressions are
              allowed.  regex makes its assignments unconditionally.

    ( ... )   Parentheses are used for grouping.  An operator, for example,
              ``*'', ``+'', ``{}'', can work on a single character or a regu-
              lar expression enclosed in parentheses.  For example,
              (a*(cb+)*)$0.

    By necessity, all the above defined symbols are special.  They must,
    therefore, be escaped with a \ (backslash) to be used as themselves.

 Notes

    The user program may run out of memory if regcmp is called iteratively
    without freeing the vectors that are no longer required.

    The regcmp and regex routines are in the international library libintl,
    and thus will not be accessible if the cc(CP) flag -nointl is used when
    compiling.

 See also

    ed(C), free(S), malloc(S)

 Standards conformance

    regcmp, regex and loc1 are not part of any currently supported stan-
    dard; they are an extension of AT&T System V provided by the Santa Cruz
    Operation.

 Examples

    Example 1:

       char *cursor, *newcursor, *ptr;
               ...
       newcursor = regex((ptr = regcmp("\", (char *)0)), cursor);
       free(ptr);

    This example matches a leading new-line in the subject string pointed at
    by cursor.

    Example 2:

       char ret0[9];
       char *newcursor, *name;
              ...
       name = regcmp("([A-Za-z][A-za-z0-9]{0,7})$0", (char *)0);
       newcursor = regex(name, "012Testing345", ret0);

    This example matches through the string ``Testing3'' and returns the
    address of the character after the last matched character (the ``4'').
    The string ``Testing3'' is copied to the character array ret0.

    Example 3:

       #include "file.i"
       char *string, *newcursor;
              ...
       newcursor = regex(name, string);

    This example applies a precompiled regular expression in file.i (see
    regcmp(CP)) against string.

    Example 4:


       char *ptr, *newcursor;

       ptr = regcmp("[a-[=i=][:digit:]]*",(char*)0);
       newcursor = regex(ptr, "123CHICO321");

    It is assumed in this example that the current locale's collation rules
    specify the following sequence:

       A,a,B,b,C,c,CH,Ch,ch,D,d,E,e,F,f,G,g,H,h,I,i.....

    The characters I and i are also both in the same ``primary'' collation
    group.

    The following characters are all members of the digit ctype class:

       0, 1, 2, 3, 4, 5, 6, 7, 8, 9

    This example matches through the string ``123CHIC'' and returns the
    address of the character ``O'' in the string.


Typewritten Software • bear@typewritten.org • Edmonds, WA 98026