Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ regcmp(3) — AIX/RT 2.2.1

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

malloc, free, realloc, calloc

NCcollate, NCcoluniq, NCeqvmap, _NCxcol, _NLxcol

regexp: compile, step, advance

ed

regcmp

     regcmp, regex

Purpose

     Compiles and matches regular-expression patterns.

Library

     Programmers Workbench Library (libPW.a)

Syntax

       char *regcmp (str [, str, . . . |, (char *char *regex (pat, subject [, ret, . . . |)
       char *str, *str, . . . ;                  char *pat, *subject, *ret, . . . ;

                                                 extern char *__loc1;

     Description

     The regcmp  subroutine compiles a regular  expression (or
     pattern) and returns a pointer to the compiled form.  The
     str parameters  specify the  pattern to be  compiled.  If
     more than one str parameter  is given, then regcmp treats
     them as if they were concatenated together.  It returns a
     NULL pointer if it encounters an incorrect parameter.

     You  can  use  the  regcmp  command  to  compile  regular
     expressions into  your C program,  frequently eliminating
     the need to call the regcmp subroutine at run time.

     The regex  subroutine compares a compiled  pattern to the
     subject  string.   Additional   parameters  are  used  to
     receive  values.  Upon  successful completion,  the regex
     subroutine returns a pointer  to the next unmatched char-
     acter.  If the regex subroutine  fails, a NULL pointer is
     returned.  A global character  pointer, __loc1, points to
     where the match began.

     The regcmp and regex subroutines are borrowed from the ed
     command;  however, the  syntax  and  semantics have  been
     changed slightly.  You can use the following symbols with
     the regcmp and regex subroutines:

     "[ ] * . ^"
        These symbols have the same  meaning as they do in the
        ed command.

     "-"
        For regex,  the minus within brackets  means "through"
        according  to  the  current collating  sequence.   For
        example, "[a-z]"  can be equivalent  to "[abcd" .  . .
        "xyz]" or "[aBbCc" . . . "xYyZz]" or even "[aa>a<a^bc"
        . .  . "xyz]".  You can  use the "-" by  itself if the
        "-" is the last or  first character.  For example, the

        character  class  expression  "[]-]" matches  the  "]"
        (right bracket) and "-" (minus) characters.

        The regcmp  subroutine does  not use the  current col-
        lating sequence,  and the minus character  in brackets
        controls only  a direct ASCII sequence.   For example,
        "[a-z]" always  means "[abc  . .  . xyz]"  and "[A-Z]"
        always  means "[ABC  .  .  . XYZ]".   If  you need  to
        control  the  specific  characters in  a  range  using
        regcmp,  you must  list  them  explicitly rather  than
        using the minus in the character class expression.

     "$"
        Matches the  end of the  string.  Use "\n" to  match a
        new-line character.

     "+"
        A regular expression followed by "+" means one or more
        times.   For   example,  "[0-9]+"  is   equivalent  to
        "[0-9][0-9]*".

     "{"m"}" "{"m,"}" "{"m,u"}"
        Integer values enclosed in "{" "}" indicate the number
        of times to apply the preceding regular expression.  m
        is the minimum number and  u is the maximum number.  u
        must  be less  than 256.   If you  specify only  m, it
        indicates  the  exact number  of  times  to apply  the
        regular   expression.   "{"m,"}"   is  equivalent   to
        "{"m,&infinity."}" and  matches m or  more occurrences
        of  the  expression.   The  plus "+"  (plus)  and  "*"
        (asterisk)  operations are  equivalent  to "{1,}"  and
        "{0,}", respectively.

     "(" . . . ")$"n
        This stores the value  matched by the enclosed regular
        expression  in  the   (n+1)(th)  ret  parameter.   Ten
        enclosed regular expressions are allowed.  regex makes
        the assignments unconditionally.

     "(" . . . ")"
        Parentheses group  subexpressions.  An  operator, such
        as "*", "+", or "{" "}" works on a single character or
        on a regular expression  enclosed in parenthesis.  For
        example, "(a*(cb+)*)$0".

     All of the  above defined symbols are  special.  You must
     precede them with a "\"  (backslash) if you want to match
     the special  symbol itself.  For example,  "\$" matches a
     dollar sign.

     Note:   regcmp uses  the  malloc subroutine  to make  the
     space for the  vector.  Always free the  vectors that are
     not required.  If you do not free the unrequired vectors,
     you may run out of memory if regcmp is called repeatedly.
     Use the  following as a  replacement for malloc  to reuse
     the same vector, thus saving time and space:

       /*  . . . Your Program . . .  */

       malloc(n)
          int n;
       {
          static int rebuf[256];

          return ((n <= sizeof(rebuf)) ? rebuf : NULL);
       }

     Examples

     1.  To perform a simple match:

           char *cursor, *newcursor, *ptr;
            . . .
           newcursor = regex((ptr = regcmp("^\n", 0)), cursor);
           free(ptr);

         This  matches a  leading  new-line  character in  the
         subject string pointed to by "cursor".

     2.  To extract a substring that matches a pattern:

           char ret0[9];
           char *newcursor, *name;
            . . .
           name = regcmp("([A-Za-z][A-Za-z0-9]{0,7})$0", 0);
           newcursor = regex(name, "123Testing321", ret0);

         This    matches   the    eight-character   identifier
         "Testing3" and  returns the address of  the character
         after the last matched  character (which is stored in
         "newcursor").  The  string "Testing3" is  copied into
         the character array "ret0".

     Related Information

     In   this  book:    "malloc,   free,  realloc,   calloc,"
     "NCcollate, NCcoluniq, NCeqvmap,  _NCxcol, _NLxcol," and
     "regexp: compile, step, advance."

     The ed and  regcmp commands in AIX  Operating System Com-
     mands Reference.

     "Overview of International Character Support" in Managing
     the AIX Operating System.

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026