Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ regexp(5) — NEWS-os 5.0.1

Media Vault

Software Library

Restoration Projects

Artifacts Sought



regexp(5)              MISC. FILE FORMATS               regexp(5)



NAME
     regexp:  compile, step, advance - regular expression compile
     and match routines

SYNOPSIS
     #define INIT declarations
     #define GETC(void) getc code
     #define PEEKC(void) peekc code
     #define UNGETC(void) ungetc code
     #define RETURN(ptr) return code
     #define ERROR(val) error code
     #include <regexp.h>
     char *compile(char *instring, char *expbuf, char *endbuf,
          int eof);
     int step(char *string, char *expbuf);
     int advance(char *string, char *expbuf);
     extern char *loc1, *loc2, *locs;

DESCRIPTION
     These  functions  are  general  purpose  regular  expression
     matching  routines to be used in programs that perform regu-
     lar expression matching.  These functions are defined by the
     <regexp.h>  header  file.  The functions step and advance do
     pattern matching given a character  string  and  a  compiled
     regular  expression as input.  The function compile takes as
     input a regular expression as defined below and  produces  a
     compiled  expression  that can be used with step or advance.
     A regular expression specifies a set of  character  strings.
     A member of this set of strings is said to be matched by the
     regular expression.  Some characters  have  special  meaning
     when  used  in  a regular expression; other characters stand
     for themselves.  The regular expressions available  for  use
     with the regexp functions are constructed as follows:

     Expression  Meaning

     c           the character c where c is not a special charac-
                 ter.

     \c          the character c where c is any character, except
                 a digit in the range 1-9.

     ^           the beginning of the line being compared.

     $           the end of the line being compared.

     .           any character in the input.

     [s]         any character  in  the  set  s,  where  s  is  a
                 sequence of characters and/or a range of charac-
                 ters, e.g., [c-c].




                                                                1





regexp(5)              MISC. FILE FORMATS               regexp(5)



     [^s]        any character not in  the  set  s,  where  s  is
                 defined as above.

     r*          zero or more successive occurrences of the regu-
                 lar expression r.  The longest leftmost match is
                 chosen.

     rx          the occurrence of regular expression r  followed
                 by  the  occurrence  of  regular  expression  x.
                 (Concatenation)

     r\{m,n\}    any number of m through n successive occurrences
                 of   the  regular  expression  r.   The  regular
                 expression r\{m\} matches exactly m occurrences;
                 r\{m,\} matches at least m occurrences.

     \(r\)       the regular expression r.  When \n (where n is a
                 number  greater  than  zero)  appears  in a con-
                 structed regular expression, it stands  for  the
                 regular  expression x where x is the nth regular
                 expression enclosed in \( and \)  that  appeared
                 earlier  in  the constructed regular expression.
                 For example, \(r\)x\(y\)z\2 is the concatenation
                 of  regular  expressions rxyzy.  Characters that
                 have special meaning  except  when  they  appear
                 within square brackets ([]) or are preceded by \
                 are:  ., *, [,  \.   Other  special  characters,
                 such  as  $  have  special  meaning in more res-
                 tricted contexts.  The character ^ at the begin-
                 ning of an expression permits a successful match
                 only immediately after a newline, and the  char-
                 acter  $  at the end of an expression requires a
                 trailing newline.  Two characters  have  special
                 meaning  only  when used within square brackets.
                 The character - denotes a range,  [c-c],  unless
                 it  is just after the open bracket or before the
                 closing bracket, [-c] or [c-] in which  case  it
                 has no special meaning.  When used within brack-
                 ets, the character ^ has the meaning  complement
                 of  if  it  immediately follows the open bracket
                 (example:  [^c]);  elsewhere  between   brackets
                 (example: [c^]) it stands for the ordinary char-
                 acter ^.  The special meaning of the \  operator
                 can be escaped only by preceding it with another
                 \, e.g. \\.  Programs must  have  the  following
                 five   macros   declared   before  the  #include
                 <regexp.h> statement.  These macros are used  by
                 the  compile  routine.   The macros GETC, PEEKC,
                 and UNGETC operate  on  the  regular  expression
                 given as input to compile.

     GETC           This macro returns  the  value  of  the  next



                                                                2





regexp(5)              MISC. FILE FORMATS               regexp(5)



                    character  (byte)  in  the regular expression
                    pattern.  Successive  calls  to  GETC  should
                    return  successive  characters of the regular
                    expression.

     PEEKC          This macro returns the next character  (byte)
                    in  the regular expression.  Immediately suc-
                    cessive calls to PEEKC should return the same
                    character,  which  should  also  be  the next
                    character returned by GETC.

     UNGETC         This  macro  causes  the  argument  c  to  be
                    returned  by the next call to GETC and PEEKC.
                    No more than one  character  of  pushback  is
                    ever  needed and this character is guaranteed
                    to be the last character read by  GETC.   The
                    return value of the macro UNGETC(c) is always
                    ignored.

     RETURN(ptr)    This macro is used on normal exit of the com-
                    pile  routine.  The value of the argument ptr
                    is a pointer to the character after the  last
                    character of the compiled regular expression.
                    This is useful to programs which have  memory
                    allocation to manage.

     ERROR(val)     This macro is the abnormal  return  from  the
                    compile  routine.   The  argument  val  is an
                    error number [see ERRORS below for meanings].
                    This call should never return.  The syntax of
                    the compile routine is as follows:
                    compile(instring, expbuf, endbuf, eof)
     The first parameter, instring, is never used  explicitly  by
     the  compile  routine  but  is useful for programs that pass
     down different pointers to input characters.   It  is  some-
     times  used  in  the INIT declaration (see below).  Programs
     which call functions to input characters or have  characters
     in  an external array can pass down a value of (char *)0 for
     this parameter.  The next parameter, expbuf, is a  character
     pointer.   It points to the place where the compiled regular
     expression will be placed.  The parameter endbuf is one more
     than  the highest address where the compiled regular expres-
     sion may be placed.  If the compiled expression  cannot  fit
     in  (endbuf-expbuf) bytes, a call to ERROR(50) is made.  The
     parameter eof is the character which marks the  end  of  the
     regular  expression.   This  character is usually a /.  Each
     program that includes the <regexp.h> header file must have a
     #define  statement  for  INIT.   It  is  used  for dependent
     declarations and initializations.  Most often it is used  to
     set  a  register  variable  to point to the beginning of the
     regular expression so that this  register  variable  can  be
     used  in  the  declarations  for  GETC,  PEEKC,  and UNGETC.



                                                                3





regexp(5)              MISC. FILE FORMATS               regexp(5)



     Otherwise it can be used to declare external variables  that
     might  be  used  by  GETC,  PEEKC  and UNGETC.  [See EXAMPLE
     below.]  The first parameter to the step and  advance  func-
     tions  is  a pointer to a string of characters to be checked
     for a match.  This string should be  null  terminated.   The
     second parameter, expbuf, is the compiled regular expression
     which was obtained by a call to the function  compile.   The
     function  step  returns non-zero if some substring of string
     matches the regular expression in expbuf and zero  if  there
     is  no  match.   If there is a match, two external character
     pointers are set as a side effect to the call to step.   The
     variable loc1 points to the first character that matched the
     regular expression; the variable loc2 points to the  charac-
     ter  after  the  last  character  that  matches  the regular
     expression.  Thus if  the  regular  expression  matches  the
     entire  input string, loc1 will point to the first character
     of string and loc2 will point to the  null  at  the  end  of
     string.   The  function advance returns non-zero if the ini-
     tial substring of string matches the regular  expression  in
     expbuf.  If there is a match, an external character pointer,
     loc2, is set as a side effect.  The variable loc2 points  to
     the  next  character in string after the last character that
     matched.  When advance encounters a * or \{ \}  sequence  in
     the  regular  expression, it will advance its pointer to the
     string to be matched as far as possible and will recursively
     call  itself  trying  to match the rest of the string to the
     rest of the regular expression.  As  long  as  there  is  no
     match,  advance will back up along the string until it finds
     a match or reaches the point in the  string  that  initially
     matched  the  * or \{ \}.  It is sometimes desirable to stop
     this backing up before the initial point in  the  string  is
     reached.  If the external character pointer locs is equal to
     the point in the string at sometime during  the  backing  up
     process,  advance  will  break out of the loop that backs up
     and will return zero.  The external  variables  circf,  sed,
     and nbra are reserved.

RETURN VALUE
     The function compile uses the macro RETURN  on  success  and
     the  macro ERROR on failure (see above).  The functions step
     and advance return non-zero on a successful match  and  zero
     if there is no match.

ERRORS
     11   range endpoint too large.

     16   bad number.

     25   \ digit out of range.

     36   illegal or missing delimiter.




                                                                4





regexp(5)              MISC. FILE FORMATS               regexp(5)



     41   no remembered search string.

     42   \( \) imbalance.

     43   too many \(.

     44   more than 2 numbers given in \{ \}.

     45   } expected after \.

     46   first number exceeds second in \{ \}.

     49   [ ] imbalance.

     50   regular expression overflow.

EXAMPLE
     The following is an example of how  the  regular  expression
     macros and calls might be defined by an application program:
     #define INIT         register char *sp = instring;
     #define GETC       (*sp++)
     #define PEEKC      (*sp)
     #define UNGETC(c)    (--sp)
     #define RETURN(*c)    return;
     #define ERROR(c)     regerr
     #include <regexp.h>
      . . .
           (void) compile(*argv, expbuf, &expbuf[ESIZE],'\0');
      . . .
           if (step(linebuf, expbuf))
                             succeed;
























                                                                5



Typewritten Software • bear@typewritten.org • Edmonds, WA 98026