Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ regexp(5) — DG/UX 4.00

Media Vault

Software Library

Restoration Projects

Artifacts Sought



                                                                regexp(5)



        _________________________________________________________________
        regexp                                                 Miscellany
        regular expression compile and match routines
        _________________________________________________________________


        SYNTAX

        #define INIT <declarations>
        #define GETC() <getc code>
        #define PEEKC() <peekc code>
        #define UNGETC(c) <ungetc code>
        #define RETURN(pointer) <return code>
        #define ERROR(val) <error code>

        #include <regexp.h>

        char *compile (instring, expbuf, endbuf, eof)
        char *instring, *expbuf, *endbuf;
        int eof;

        int step (string, expbuf)
        char *string, *expbuf;

        extern char *loc1, *loc2, *locs;

        extern int circf, sed, nbra;


        DESCRIPTION

        This entry describes general-purpose regular expression matching
        routines in the form of ed(1), defined in /usr/include/regexp.h.
        Programs that perform regular expression matching use this source
        file, including ed(1), sed(1), grep(1), bs(1), and expr(1).  Only
        this file need be changed to maintain regular expression
        compatibility.

        The interface to this file is unpleasantly complex.  Programs
        that include this file must have the following five macros
        declared before the #include <regexp.h> statement.  These macros
        are used by the compile routine.

        GETC()    Return the value of the next character in the regular
                  expression pattern.  Successive calls to GETC() should
                  return successive characters of the regular expression.

        PEEKC()   Return the next character in the regular expression.
                  Successive calls to PEEKC() should return the same
                  character (which should also be the next character
                  returned by GETC()).



        DG/UX 4.00                                                 Page 1
               Licensed material--property of copyright holder(s)





                                                                regexp(5)



        UNGETC(c) Make the argument c return from the next call to GETC()
                  or  PEEKC().  No more that one character of pushback is
                  ever needed. This character is guaranteed to be the
                  last character read by GETC().  The value of the macro
                  UNGETC(c) is always ignored.

        RETURN(pointer)
                  This macro is used on normal exit of the compile
                  routine.  The value of the argument pointer is a
                  pointer to the character after the last character of
                  the compiled regular expression.  This is useful to
                  programs that have to manage memory allocation.

        ERROR(val)
                  This is the abnormal return from the compile routine.
                  The argument val is an error number (see table below
                  for meanings).  This call should never return.

                  ERROR
                       MEANING
                  11   Range endpoint too large
                  16   Bad number
                  25   \digit out of range
                  36   Illegal or missing delimiter
                  41   No remembered search string
                  42   \( \) imbalance
                  43   Too many \(
                  44   More than 2 numbers given in \{ \}
                  45   } expected after \
                  46   First number exceeds second in \{ \}
                  49   [ ] imbalance
                  50   Regular expression overflow

        The syntax of the compile routine is as follows:

             compile(instring, expbuf, endbuf, eof)

        The first parameter instring is never used explicitly by the
        compile routine but is useful for programs that pass down
        different pointers to input characters.  It is sometimes used in
        the INIT declaration (see below).  Programs that call functions
        to input characters or have characters in an external array can
        pass down a value of ((char *) 0) for this parameter.

        The next parameter expbuf is a character pointer.  It points to
        the place where the compiled regular expression will be placed.

        The parameter endbuf is one more than the highest address where
        the compiled regular expression may be placed.  If the compiled
        expression cannot fit in (endbuf-expbuf) bytes, a call to
        ERROR(50) is made.



        DG/UX 4.00                                                 Page 2
               Licensed material--property of copyright holder(s)





                                                                regexp(5)



        The parameter eof is the character which marks the end of the
        regular expression.  For example, in ed(1), this character is
        usually /.

        Each program that includes this file must have a #define
        statement for INIT.  This definition will be placed right after
        the declaration for the function compile and the opening curly
        brace ({).  It is used for dependent declarations and
        initializations.  Most often, it is used to set a register
        variable to point to the beginning of the regular expression This
        register variable can then be used in the declarations for
        GETC(), PEEKC() and UNGETC().  Otherwise, you can use it to
        declare external variables that might be used by GETC(), PEEKC()
        and UNGETC().  See the example below of the declarations taken
        from grep(1).

        Other functions in this file perform actual regular expression
        matching, one of which is the function step.  The call to step is
        as follows:

             step(string, expbuf)

        The first parameter to step is a pointer to a string of
        characters to be checked for a match.  This string should be null
        terminated.

        The second parameter expbuf is the compiled regular expression
        obtained by a call of the function compile.

        The function step returns non-zero if the given string matches
        the regular expression, and zero if the expressions do not match.
        If there is a match, two external character pointers are set as a
        side effect to the call to step.  The variable set in step is
        loc1.  This is a pointer to the first character that matched the
        regular expression.  The variable loc2, which is set by the
        function advance, points to the character after the last
        character that matches the regular expression.  Thus if the
        regular expression matches the entire line, loc1 will point to
        the first character of string and loc2 will point to the null at
        the end of string.

        Step uses the external variable circf, which is set by compile if
        the regular expression begins with ^.  If this is set, step will
        try to match the regular expression to the beginning of the
        string only.  If more than one regular expression is to be
        compiled before the first is executed, save the value of circf
        for each compiled expression and set circf to that saved value
        before each call to step.

        The function advance is called from step with the same arguments
        as step.  Step steps through the string argument and calls



        DG/UX 4.00                                                 Page 3
               Licensed material--property of copyright holder(s)





                                                                regexp(5)



        advance until advance returns non-zero, indicating a match, or
        until the end of string is reached.  If one wants to constrain
        string to the beginning of the line in all cases, step need not
        be called; simply call advance.

        When advance encounters a * or \{ \} sequence in the regular
        expression, it advances its pointer to the string to be matched
        as far as possible. It recursively calls itself, trying to match
        the rest of the string to the rest of the regular expression.  As
        long as there is no match, advance backs up along the string
        until it finds a match, or until it reaches the point in the
        string that initially matched the * or \{ \}.  You may want to
        stop this backing up before the initial point in the string is
        reached.  If the external character pointer locs is equal to the
        point in the string at some time during the backing up process,
        advance breaks out of the loop that backs up and returns zero.
        This is used by ed(1) and sed(1) for substitutions done globally
        (not just the first occurrence, but the whole line).  Therefore,
        expressions like s/y*//g do not loop forever.

        The additional external variables sed and nbra are used for
        special purposes.


        EXAMPLES

        The following shows how the regular expression macros and calls
        look from grep(1):

        #define INIT   register char *sp = instring;
        #define GETC() (*sp++)
        #define PEEKC()     (*sp)
        #define UNGETC(c)   (--sp)
        #define RETURN(c)   return;
        #define ERROR(c)    regerr()

        #include <regexp.h>
        ...
                (void) compile(*argv, expbuf, &expbuf[ESIZE], '\0');
        ...
                if (step(linebuf, expbuf))
                        succeed();


        FILES

        /usr/include/regexp.h


        SEE ALSO




        DG/UX 4.00                                                 Page 4
               Licensed material--property of copyright holder(s)





                                                                regexp(5)



        bs(1), ed(1), expr(1), grep(1), sed(1) in the User's Reference
        for the DG/UX System




















































        DG/UX 4.00                                                 Page 5
               Licensed material--property of copyright holder(s)



Typewritten Software • bear@typewritten.org • Edmonds, WA 98026