Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ regexp(3) — AIX/RT 2.2.1

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

NCcollate, NCcoluniq, NCeqvmap, _NCxcol, _NLxcol

regcmp, regex

ed

grep

sed

     regexp: compile, step, advance

Purpose

     Compiles and matches regular-expression patterns.

Library

     None

Syntax

     #define INIT             declarations
     #define GETC( )          getc_code
     #define PEEKC( )         peekc_code
     #define UNGETC(c)        ungetc_code
     #define RETURN(pointer)  return_code
     #define ERROR(val)       error_code

     #include <regexp.h>

       char *compile (instring, expbuf, endbuf, eint step (string, expbuf)
       char *instring, *expbuf, *endbuf;         char *string, *expbuf;
       char eof;
                                                 int advance (string, expbuf)
                                                 char *string, *expbuf;

     Description

     The regexp.h header file  defines several general purpose
     subroutines   that  perform   regular-expression  pattern
     matching.    Programs  that   perform  regular-expression
     pattern matching such as ed,  sed, grep, bs, and expr use
     this source file.   In this way, only this  file needs to
     be changed  in order to maintain  regular expression com-
     patibility between programs.

     The regexp.h header file  handles extended characters and
     may  require access  to the  current collating  sequence.
     You can disable the extended functionality of regexp.h by
     defining the preprocessor  variable RTPC_NO_NLS.  This is
     useful  for tasks  such as  building programs  to run  on
     prior releases  of AIX.   See "Overview  of International
     Character Support"  in Managing the AIX  Operating System
     for more information.

     The interface  to this header file  is complex.  Programs
     that include  this file define the  following five macros
     before the  #include <regexp.h> statement.   These macros
     are used by the compile subroutine.

     INIT
        This  macro is  used  for  dependent declarations  and
        initializations.  It is placed  right after the decla-
        ration  and opening  "{" (left  brace) of  the compile

        subroutine.  The  definition of  INIT must end  with a
        ";"  (semicolon).  INIT  is frequently  used to  set a
        register  variable  to  point  the  beginning  of  the
        regular expression so that  this register variable can
        be  used  in the  declarations  for  GETC, PEEKC,  and
        UNGETC.   Otherwise,  you  can  use  INIT  to  declare
        external variables that GETC, PEEKC, and UNGETC need.

     GETC( )
        This macro returns the value  of the next character in
        the regular  expression pattern.  Successive  calls to
        the GETC macro should  return successive characters of
        the pattern.

     PEEKC( )
        This macro  returns the next character  in the regular
        expression.   Successive  calls  to  the  PEEKC  macro
        should return the same character, which should also be
        the next character returned by the GETC macro.

     UNGETC(c)
        This macro  causes the parameter  c to be  returned by
        the next call  to the GETC and PEEKC  macros.  No more
        than one character of pushback is ever needed and this
        character is guaranteed to be that last character read
        by the  GETC macro.   The return  value of  the UNGETC
        macro is always ignored.

     RETURN(pointer)
        This macro is used on  normal exit of the compile sub-
        routine.  The  pointer parameter  points to  the first
        character immediately  following the  compiled regular
        expression.   This is  useful  to  programs that  have
        memory allocation to manage.

     ERROR(val)
        This macro is  used on abnormal exit  from the compile
        subroutine.  It  should never contain a  return state-
        ment.   The val  parameter  is an  error number.   The
        error values and their meanings are:

        Error    Meaning

        11       Range endpoint too large.
        16       Bad number.
        25       "\"digit out of range.
        36       Illegal or missing delimiter.
        41       No remembered search string.
        42       "\( \)" imbalance.
        43       Too many "\(".
        44       More than two numbers given in \{ \}.
        45       "}" expected after "\".
        46       First number exceeds second in \{ \}.
        49       "[ ]" imbalance.
        50       Regular expression overflow.

     The  compile subroutine  compiles the  regular expression
     for  later use.   The  instring parameter  is never  used
     explicitly by the compile subroutine,  but you can use it
     in your macros.   For instance, you may want  to pass the

     string containing  the pattern as the  instring parameter
     to compile and use the INIT macro to set a pointer to the
     beginning of  this string.   (The following  example uses
     this  technique.)  If  your macros  do not  use instring,
     then call compile  with a value of ((char *)  0) for this
     parameter.

     The expbuf  parameter points  to a character  array where
     the  compiled regular  expression is  to be  placed.  The
     endbuf parameter points to  the location that immediately
     follows the  character array  where the  compiled regular
     expression is  to be placed.  If  the compiled expression
     cannot fit  in (endbuf-expbuf) bytes, the  call ERROR(50)
     is made.

     The eof parameter is the  character that marks the end of
     the regular  expression.  For  example, in ed  this char-
     acter is usually "'/'" (slash).

     The regexp.h  header file defines other  subroutines that
     perform actual regular-expression  pattern matching.  One
     of these is the step subroutine.

     The  string parameter  of step  is a  pointer to  a null-
     terminated  string  of characters  to  be  checked for  a
     match.

     The  expbuf  parameter  points to  the  compiled  regular
     expression, which was  obtained by a call  to the compile
     subroutine.

     The  step subroutine  returns the  value 1  if the  given
     string matches the  pattern, and 0 if it  does not match.
     If it matches,  then step also sets  two global character
     pointers:  loc1, which points to the first character that
     matches the pattern, and loc2,  which points to the char-
     acter  immediately  following  the  last  character  that
     matches  the pattern.   Thus, if  the regular  expression
     matches the entire string, then  loc1 points to the first
     character of string and loc2 points to the null character
     at the end of string.

     The step subroutine uses the global variable circf, which
     is set by compile if the regular expression begins with a
     "^"  (circumflex).  If  this variable  is set,  then step
     only tries to match the  regular expression to the begin-
     ning of the string.  If you compile more than one regular
     expression is  before executing the first  one, then save
     the value of  circf for each compiled  expression and set
     circf to that saved value before each call to step.

     The step subroutine calls a subroutine named advance with
     the same parameters  that it was passed.   The step func-
     tion increments  through the  string parameter  and calls
     advance until advance returns a 1, indicating a match, or
     until the end of string  is reached.  To constrain string
     to the  beginning of  the string in  all cases,  call the
     advance subroutine directly instead of calling step.

     When advance  encounters an  "*" (asterisk) or  a "\{ \}"
     sequence  in  the  regular expression,  it  advances  its
     pointer to  the string to  be matched as far  as possible
     and recursively calls itself trying  to match the rest of
     the string  to the  rest of  the regular  expression.  As
     long as  there is  no match, advance  backs up  along the
     string until it finds a match or reaches the point in the
     string that initially matched the  "*" or "\{ \}".  It is
     sometimes desirable  to stop  this backing-up  before the
     initial point  in the string  is reached.  If  the global
     character  pointer locs  is  equal to  the  point in  the
     string sometime  during the  backing up  process, advance
     breaks out of the loop that backs up and returns 0.  This
     is used  by ed  and sed for  global substitutions  on the
     whole line so that expressions like "s/y*//g" do not loop
     forever.

     Example

     The  following is  an example  of the  regular expression
     macros and calls from the grep command.

       #define INIT          register char *sp=instring;
       #define GETC()        (*sp++)
       #define PEEKC()       (*sp)
       #define UNGETC(c)     (--sp)
       #define RETURN(c)     return;
       #define ERROR(c)      regerr()

       #include <regexp.h>
        . . .
       compile (patstr, expbuf, &expbuf[ESIZE], '\0');
        . . .
       if (step (linebuf, expbuf))
          succeed ( );
        . . .

     Related Information

     In this book:   "NCcollate, NCcoluniq, NCeqvmap, _NCxcol,
     _NLxcol" and "regcmp, regex."

     The ed,  grep, and sed  commands in AIX  Operating System
     Commands Reference.

     "Overview of International Character Support" in Managing
     the AIX Operating System.

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026