Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ regcmp(3X) — A/UX 0.7

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

ed(1)

regcmp(1)

malloc(3C)



     regcmp(3X)                                             regcmp(3X)



     NAME
          regcmp, regex - compile and execute a regular expression

     SYNOPSIS
          char *regcmp(string1 [, string2, ...], (char *)0))
          char *string1, *string2, ...;

          char *regex(re, subject[, ret0, ...])
          char *re, *subject, *ret0, ...;

          extern char *loc1;

     DESCRIPTION
          regcmp compiles a regular expression and returns a pointer
          to the compiled form.  malloc(3C) is used to create space
          for the vector.  It is the user's responsibility to free
          unneeded space that has been allocated by malloc.  A NULL
          return from regcmp indicates an incorrect argument.
          regcmp(1) has been written to generally preclude the need
          for this routine at execution time.

          regex executes a compiled pattern against the subject
          string.  Additional arguments are passed to receive values
          back.  regex returns NULL on failure or a pointer to the
          next unmatched character on success.  A global character
          pointer loc1 points to where the match began.  regcmp and
          regex were mostly borrowed from the editor, ed(1); however,
          the syntax and semantics have been changed slightly.  The
          following are the valid symbols and their associated
          meanings.

          []*.^     These symbols retain their current meaning.

          $         This symbol matches the end of the string; \n
                    matches the newline.

          -         Within brackets the minus means ``through''.  For
                    example, [a-z] is equivalent to [abcd...xyz].  The
                    - can appear as itself only if used as the last or
                    first character.  For example, the character class
                    expression []-] matches the characters ] and -.

          +         A regular expression followed by + means ``one or
                    more times''.  For example, [0-9]+ is equivalent
                    to [0-9][0-9]*.

          {m} {m,}  {m,u} Integer values enclosed in {} indicate the
                    number of times the preceding regular expression
                    is to be applied.  The minimum number is m and the
                    maximum number is u, which must be less than 256.
                    If only m is present (e.g., {m}), it indicates the
                    exact number of times the regular expression is to



     Page 1                                        (last mod. 1/14/87)





     regcmp(3X)                                             regcmp(3X)



                    be applied.  {m,} is analogous to {m,infinity}.
                    The plus (+) and star (*) operations are
                    equivalent to {1,} and {0,}, respectively.

          ( ... )$n The value of the enclosed regular expression is to
                    be returned.  The value will be stored in the
                    (n+1)th argument following the subject argument.
                    At present, at most 10 enclosed regular
                    expressions are allowed.  regex makes its
                    assignments unconditionally.

          ( ... )   Parentheses are used for grouping.  An operator
                    (e.g., *, +, {}) can work on a single character or
                    a regular expression enclosed in parentheses.  For
                    example, (a*(cb+)*)$0.

          By necessity, all the above defined symbols are special.
          They must, therefore, be escaped to be used as themselves.

     EXAMPLES
          Example 1:

          char *cursor, *newcursor, *ptr;
               ...
          newcursor = regex((ptr = regcmp("^\n", 0)), cursor);
          free(ptr);

          This example will match a leading newline in the subject
          string pointed at by cursor.

          Example 2:

          char ret0[9];
          char *newcursor, *name;
               ...
          name = regcmp("([A-Za-z][A-za-z0-9_]{0,7})$0", 0);
          newcursor = regex(name, "123Testing321", ret0);

          This example will match through the string ``Testing3'' and
          will return the address of the character after the last
          matched character (cursor+11).  The string ``Testing3'' will
          be copied to the character array ret0.

          Example 3:

          #include "file.i"
          char *string, *newcursor;
               ...
          newcursor = regex(name, string);

          This example applies a precompiled regular expression in
          file.i (see regcmp(1)) against string.



     Page 2                                        (last mod. 1/14/87)





     regcmp(3X)                                             regcmp(3X)



          This routine is kept in /lib/libPW.a.

     SEE ALSO
          ed(1), regcmp(1), malloc(3C).

     BUGS
          The user program may run out of memory if regcmp is called
          iteratively without freeing the vectors no longer required.
          The following user-supplied replacement for malloc(3C)
          reuses the same vector, saving time and space:

               /* user's program */
                    ...
               char *
               malloc(n)
               unsigned n;
               {
                    static char rebuf[512];
                    return (n <= sizeof rebuf) ? rebuf : NULL;
               }



































     Page 3                                        (last mod. 1/14/87)



Typewritten Software • bear@typewritten.org • Edmonds, WA 98026