Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ extract(1) — Reliant UNIX 5.44c4

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

ed(1)

gencat(1)

extract(1)                                                       extract(1)

NAME
     extract, strextract - interactive string extract and replace

SYNOPSIS
     extract [option ...] file ...

DESCRIPTION
     extract is used to interactively extract strings from source files and
     replace them with calls to a function that retrieves matching strings
     from a message catalog at program runtime. This process is controlled
     by two files:

     -  a pattern file

     -  an ignore file

     The pattern file contains

     1. text patterns with which strings from the specified source file are
        to be compared, and

     2. rules by which strings that match the patterns are extracted and
        rewritten.

     The ignore file contains strings that are not to be rewritten in the
     source.

OPTIONS
     -i ignorefile
          Uses the file named ignorefile as the ignore file.

          -i ignorefile not specified:

          The default ignore file is called ignore. This file is looked for
          in the current directory, then in the home directory, and finally
          in /usr/lib/nls/extract.

     -m prefix
          For prefix you can specify characters or strings to be placed
          before all message numbers in the nl file and in the .msf file.

          -m prefix not specified:

          No prefix is added to the message numbers.

     -n   Creates a separate message catalog source file for each file
          given in the file list. Message numbers in each message file
          start with 1.







Page 1                       Reliant UNIX 5.44                Printed 11/98

extract(1)                                                       extract(1)

          -n not specified:

          Only one message catalog source file is created, with the mes-
          sages being numbered sequentially.

     -p patternfile
          Uses the file named patternfile as the pattern file.

          -p patternfile not specified:

          The default pattern file is called pattern. This file is looked
          for in the current directory, then in the home directory, and
          finally in /usr/lib/nls/extract.

     -s string
          For string you can specify characters to be output at the start
          of the .msf file.

          -s string not specified:

          The default output string is $quote ", or if defined, the CATHEAD
          directive from the pattern file.

     -u   Uses a message file produced by a previous run of strextract. For
          each given file, the current directory is scanned for a corre-
          sponding file with the suffix .msg. This file contains detailed
          information on all strings that match patterns in the pattern
          file, including, among other things, their respective positions
          and line numbers in the file. extract calls strextract and inter-
          prets its output.

     file Name of the source file to be processed by extract. More than one
          file may be specified.

   The pattern file

     The pattern file contains text patterns to be matched with strings in
     the specified source file and rules by which matching strings are to
     be extracted and rewritten. You can also define patterns for strings
     which you do not want to have replaced, so as to prevent strings being
     illegally modified and thus producing illegal code.

     The syntax of text patterns is identical to the regular expression
     syntax used in ed. The example below illustrates a pattern file for
     use with C source programs.









Page 2                       Reliant UNIX 5.44                Printed 11/98

extract(1)                                                       extract(1)

     The pattern file is split into several sections, each section uniquely
     identified by $KEYWORD at the start of a line.

     $KEYWORD is one of the following:

     $SRCHEAD1
          Header file to be inserted at the beginning of the first new
          source file.

     $SRCHEAD2
          Header file to be inserted at the beginning of the second new
          source file.

     $CATHEAD
          Header file to be inserted at the beginning of the message cata-
          log to be produced. Here you might want to include the SCCS ID,
          the definition of the current message set, and so on. You can
          override this by using the -s option.

     $REWRITE
          Rule for rewriting matching strings.

     $MATCH
          List of patterns for matching strings. The regular expression
          syntax is based on ed. Note: [^xyz]* means that any character
          other than x, y or z will match the pattern.

     $REJECT
          Various special C constructs are to be disregarded.

     $ERROR warning
          A warning is printed for initialized strings.

   The ignore file

     The ignore file contains literal strings (rather than patterns in the
     form of regular expressions) which are to be left unchanged in the new
     source file (the nl file). This file, if present, is read at startup
     and causes specific strings in the input file to be ignored. During a
     run, further strings to be ignored can be added to this file by using
     the ADD command (see below). The ignore file can be created and edited
     using an ordinary editor, with each string to be ignored in the input
     file being entered on a separate line.

   Mode of operation

     When extract is called, a new version of the source file is produced
     with the characters nl prepended to the original file name. A message
     catalog source file is also produced; its name is derived from the
     first file given by dropping the .c ending and adding the suffix .msf.
     This file can be input directly to gencat, or its contents can be
     translated into another native language before being passed to gencat.


Page 3                       Reliant UNIX 5.44                Printed 11/98

extract(1)                                                       extract(1)

     When run, extract displays three windows on the terminal. The first
     contains the source code with the string to be extracted and con-
     verted. This string is displayed in inverse video.

     The second window contains a list of the strings extracted so far.
     Each string is displayed with a message number.

     The third window contains a list of the available commands. The recog-
     nized commands are:

     EXTRACT
          Extract the matching string from the source file, putting it into
          the catalog source file. Use the REWRITE rule in the pattern file
          to determine what text should replace the extracted string in the
          nl file.

     DUPLICATE
          If a given string has already been extracted, extract indicates
          this fact by displaying it in inverse video.

          Entering the DUPLICATE command in such a case instructs extract
          to rewrite the source program using the same message number as
          for the previously extracted string. This is a way of saving
          storage space in message catalogs.

     IGNORE
          Ignore this and any subsequent occurrence of the string.

     PASS Skip (ignore) this occurrence of the string. Every subsequent
          occurrence of the string will still be offered to the user for a
          new selection.

     ADD  Same as IGNORE, except that the string is also added to the
          ignore file.

     COMMENT
          You can insert a comment, which will then be entered in the mes-
          sage source file (suffix .msf). This is useful for gencat.

     QUIT Exit the program after confirming whether the user really wants
          to quit. The output file will contain the results of string
          extraction up to the point of quitting. Unfortunately, it is not
          yet possible to restart extract on a source file that has already
          been partially processed (with extract).

     HELP Display a summary of available commands. The help texts are
          located in /usr/lib/nls/extract/help.

     The current command is displayed in inverse video and is executed if
     you simply press the return key <RETURN>. To select a new command you
     enter the first character of its name: e.g. i or I for IGNORE.



Page 4                       Reliant UNIX 5.44                Printed 11/98

extract(1)                                                       extract(1)

   Restrictions

     -  The current syntax for the pattern file does not allow strings in
        multi-line comments to be ignored.

     -  Only one rewrite string can be specified for all classes of pattern
        matches.

     -  The program does not recursively descend through all header files.
        These must be processed individually.

     -  Multiline strings are not recognized.

LOCALE
     The LCMESSAGES environment variable governs the language in which
     message texts are displayed. If LCMESSAGES is undefined or is defined
     as the null string, it defaults to the value of LANG. If LANG is like-
     wise undefined or null, the system acts as if it were not internation-
     alized.

     Answers to yes/no queries must be given in the language appropriate to
     the current locale.

     The LCALL environment variable governs the entire locale. LCALL
     takes precedence over all the other environment variables which affect
     internationalization.

EXAMPLES
     Example 1

     Execution of extract on the source program file1.c using newignore as
     the ignore file and cpatterns as the pattern file:

     $ extract -i newignore -p cpatterns file1.c

     The following files are created:

     a new version of the source file: nlfile1.c

     the message text file: file1.msf

     $ ls *file1*
     file1.c
     file1.msf
     nlfile1.c

     Example 2

     Use of gencat to generate the message catalog file1.cat for file1.c
     from the message text source file file1.msf:

     $ gencat file1.cat file1.msf


Page 5                       Reliant UNIX 5.44                Printed 11/98

extract(1)                                                       extract(1)

     Example 3

     Compilation of the new source file nlfile1.c generated by extract:

     $ cc nlfile1.c

     Example 4

     Example of a pattern file for a C source program:

     # Comment lines begin with the hash character #, which can be
     # escaped with a backslash \#
     #
     # First we have a few sections to be included in each new source
     # file to be created
     #
     # Header file to be inserted at the beginning of the first
     # new source file
     $SRCHEAD1
     \#include <nltypes.h>
     nlcatd catd;            /* definition of the catalog descriptor */
     # Header file to be inserted at the beginning of the
     # second new source file
     $SRCHEAD2
     \#include <nltypes.h>
     extern nlcatd catd;     /* the catalog descriptor               */
     #
     # The next header is added to the beginning of the message catalog
     # to be produced. You could, for example, put the SCCS ID,
     # the definition of the current message set, and so on here.
     # However, you can override this by calling extract with
     # the -s option.
     #
     $CATHEAD
     \$quote "
     \$set 1
     #
     # Rule by which selected matching strings are to be
     # rewritten
     $REWRITE
     catgets(catd, 1, %n, %t)
     # The % descriptors are rewritten as follows:
     # %  - a %
     # n  - message number (from the cat file)
     # t  - actual text
     # l  - length of the text string %t
     # r  - raw text with quotes removed
     # N  - insert a newline character
     # T  - insert a tab character
     #




Page 6                       Reliant UNIX 5.44                Printed 11/98

extract(1)                                                       extract(1)

     # List of patterns to be matched by strings.
     # The regular expression syntax is based on ed.
     # Note: [^xyz]* matches any character except xyz.
     #
     $MATCH
     #
     "[^"]*"
     #
     # Some special C constructs are to be ignored.
     $REJECT
     # The null string
     ""
     # Output control characters
     "\\."
     # Ignore preprocessor directives
     \#.*
     # Ignore SCCS ID lines
     [sS][cC][cC][sS][iI][dD]\[\][       ]*=[    ]*".*"
     # Ignore some library functions that use file names
     fopen[  ]*([^,]*,[^)]*)
     creat[  ]*([^,]",[^)]*)
     #
     # Issue a warning for initialized strings.
     #
     $ERROR Initialized strings cannot be replaced.
     # Print the message following ERROR when the strings below are
     # found:
     ^char[^=]*=[     ]*"[^"]*"
     char[    ]*[A-Za-z][A-Za-z0-9]*\[[^\]*\][  ]*=[   ]*"[^"]*"

SEE ALSO
     ed(1), gencat(1).

     Programmer's Guide: Internationalization - Localization.




















Page 7                       Reliant UNIX 5.44                Printed 11/98

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026