extract(1) extract(1)
NAME
extract, strextract - interactive string extract and replace
SYNOPSIS
extract [option ...] file ...
DESCRIPTION
extract is used to interactively extract strings from source files and
replace them with calls to a function that retrieves matching strings
from a message catalog at program runtime. This process is controlled
by two files:
- a pattern file
- an ignore file
The pattern file contains
1. text patterns with which strings from the specified source file are
to be compared, and
2. rules by which strings that match the patterns are extracted and
rewritten.
The ignore file contains strings that are not to be rewritten in the
source.
OPTIONS
-i ignorefile
Uses the file named ignorefile as the ignore file.
-i ignorefile not specified:
The default ignore file is called ignore. This file is looked for
in the current directory, then in the home directory, and finally
in /usr/lib/nls/extract.
-m prefix
For prefix you can specify characters or strings to be placed
before all message numbers in the nl file and in the .msf file.
-m prefix not specified:
No prefix is added to the message numbers.
-n Creates a separate message catalog source file for each file
given in the file list. Message numbers in each message file
start with 1.
Page 1 Reliant UNIX 5.44 Printed 11/98
extract(1) extract(1)
-n not specified:
Only one message catalog source file is created, with the mes-
sages being numbered sequentially.
-p patternfile
Uses the file named patternfile as the pattern file.
-p patternfile not specified:
The default pattern file is called pattern. This file is looked
for in the current directory, then in the home directory, and
finally in /usr/lib/nls/extract.
-s string
For string you can specify characters to be output at the start
of the .msf file.
-s string not specified:
The default output string is $quote ", or if defined, the CATHEAD
directive from the pattern file.
-u Uses a message file produced by a previous run of strextract. For
each given file, the current directory is scanned for a corre-
sponding file with the suffix .msg. This file contains detailed
information on all strings that match patterns in the pattern
file, including, among other things, their respective positions
and line numbers in the file. extract calls strextract and inter-
prets its output.
file Name of the source file to be processed by extract. More than one
file may be specified.
The pattern file
The pattern file contains text patterns to be matched with strings in
the specified source file and rules by which matching strings are to
be extracted and rewritten. You can also define patterns for strings
which you do not want to have replaced, so as to prevent strings being
illegally modified and thus producing illegal code.
The syntax of text patterns is identical to the regular expression
syntax used in ed. The example below illustrates a pattern file for
use with C source programs.
Page 2 Reliant UNIX 5.44 Printed 11/98
extract(1) extract(1)
The pattern file is split into several sections, each section uniquely
identified by $KEYWORD at the start of a line.
$KEYWORD is one of the following:
$SRCHEAD1
Header file to be inserted at the beginning of the first new
source file.
$SRCHEAD2
Header file to be inserted at the beginning of the second new
source file.
$CATHEAD
Header file to be inserted at the beginning of the message cata-
log to be produced. Here you might want to include the SCCS ID,
the definition of the current message set, and so on. You can
override this by using the -s option.
$REWRITE
Rule for rewriting matching strings.
$MATCH
List of patterns for matching strings. The regular expression
syntax is based on ed. Note: [^xyz]* means that any character
other than x, y or z will match the pattern.
$REJECT
Various special C constructs are to be disregarded.
$ERROR warning
A warning is printed for initialized strings.
The ignore file
The ignore file contains literal strings (rather than patterns in the
form of regular expressions) which are to be left unchanged in the new
source file (the nl file). This file, if present, is read at startup
and causes specific strings in the input file to be ignored. During a
run, further strings to be ignored can be added to this file by using
the ADD command (see below). The ignore file can be created and edited
using an ordinary editor, with each string to be ignored in the input
file being entered on a separate line.
Mode of operation
When extract is called, a new version of the source file is produced
with the characters nl prepended to the original file name. A message
catalog source file is also produced; its name is derived from the
first file given by dropping the .c ending and adding the suffix .msf.
This file can be input directly to gencat, or its contents can be
translated into another native language before being passed to gencat.
Page 3 Reliant UNIX 5.44 Printed 11/98
extract(1) extract(1)
When run, extract displays three windows on the terminal. The first
contains the source code with the string to be extracted and con-
verted. This string is displayed in inverse video.
The second window contains a list of the strings extracted so far.
Each string is displayed with a message number.
The third window contains a list of the available commands. The recog-
nized commands are:
EXTRACT
Extract the matching string from the source file, putting it into
the catalog source file. Use the REWRITE rule in the pattern file
to determine what text should replace the extracted string in the
nl file.
DUPLICATE
If a given string has already been extracted, extract indicates
this fact by displaying it in inverse video.
Entering the DUPLICATE command in such a case instructs extract
to rewrite the source program using the same message number as
for the previously extracted string. This is a way of saving
storage space in message catalogs.
IGNORE
Ignore this and any subsequent occurrence of the string.
PASS Skip (ignore) this occurrence of the string. Every subsequent
occurrence of the string will still be offered to the user for a
new selection.
ADD Same as IGNORE, except that the string is also added to the
ignore file.
COMMENT
You can insert a comment, which will then be entered in the mes-
sage source file (suffix .msf). This is useful for gencat.
QUIT Exit the program after confirming whether the user really wants
to quit. The output file will contain the results of string
extraction up to the point of quitting. Unfortunately, it is not
yet possible to restart extract on a source file that has already
been partially processed (with extract).
HELP Display a summary of available commands. The help texts are
located in /usr/lib/nls/extract/help.
The current command is displayed in inverse video and is executed if
you simply press the return key <RETURN>. To select a new command you
enter the first character of its name: e.g. i or I for IGNORE.
Page 4 Reliant UNIX 5.44 Printed 11/98
extract(1) extract(1)
Restrictions
- The current syntax for the pattern file does not allow strings in
multi-line comments to be ignored.
- Only one rewrite string can be specified for all classes of pattern
matches.
- The program does not recursively descend through all header files.
These must be processed individually.
- Multiline strings are not recognized.
LOCALE
The LCMESSAGES environment variable governs the language in which
message texts are displayed. If LCMESSAGES is undefined or is defined
as the null string, it defaults to the value of LANG. If LANG is like-
wise undefined or null, the system acts as if it were not internation-
alized.
Answers to yes/no queries must be given in the language appropriate to
the current locale.
The LCALL environment variable governs the entire locale. LCALL
takes precedence over all the other environment variables which affect
internationalization.
EXAMPLES
Example 1
Execution of extract on the source program file1.c using newignore as
the ignore file and cpatterns as the pattern file:
$ extract -i newignore -p cpatterns file1.c
The following files are created:
a new version of the source file: nlfile1.c
the message text file: file1.msf
$ ls *file1*
file1.c
file1.msf
nlfile1.c
Example 2
Use of gencat to generate the message catalog file1.cat for file1.c
from the message text source file file1.msf:
$ gencat file1.cat file1.msf
Page 5 Reliant UNIX 5.44 Printed 11/98
extract(1) extract(1)
Example 3
Compilation of the new source file nlfile1.c generated by extract:
$ cc nlfile1.c
Example 4
Example of a pattern file for a C source program:
# Comment lines begin with the hash character #, which can be
# escaped with a backslash \#
#
# First we have a few sections to be included in each new source
# file to be created
#
# Header file to be inserted at the beginning of the first
# new source file
$SRCHEAD1
\#include <nltypes.h>
nlcatd catd; /* definition of the catalog descriptor */
# Header file to be inserted at the beginning of the
# second new source file
$SRCHEAD2
\#include <nltypes.h>
extern nlcatd catd; /* the catalog descriptor */
#
# The next header is added to the beginning of the message catalog
# to be produced. You could, for example, put the SCCS ID,
# the definition of the current message set, and so on here.
# However, you can override this by calling extract with
# the -s option.
#
$CATHEAD
\$quote "
\$set 1
#
# Rule by which selected matching strings are to be
# rewritten
$REWRITE
catgets(catd, 1, %n, %t)
# The % descriptors are rewritten as follows:
# % - a %
# n - message number (from the cat file)
# t - actual text
# l - length of the text string %t
# r - raw text with quotes removed
# N - insert a newline character
# T - insert a tab character
#
Page 6 Reliant UNIX 5.44 Printed 11/98
extract(1) extract(1)
# List of patterns to be matched by strings.
# The regular expression syntax is based on ed.
# Note: [^xyz]* matches any character except xyz.
#
$MATCH
#
"[^"]*"
#
# Some special C constructs are to be ignored.
$REJECT
# The null string
""
# Output control characters
"\\."
# Ignore preprocessor directives
\#.*
# Ignore SCCS ID lines
[sS][cC][cC][sS][iI][dD]\[\][ ]*=[ ]*".*"
# Ignore some library functions that use file names
fopen[ ]*([^,]*,[^)]*)
creat[ ]*([^,]",[^)]*)
#
# Issue a warning for initialized strings.
#
$ERROR Initialized strings cannot be replaced.
# Print the message following ERROR when the strings below are
# found:
^char[^=]*=[ ]*"[^"]*"
char[ ]*[A-Za-z][A-Za-z0-9]*\[[^\]*\][ ]*=[ ]*"[^"]*"
SEE ALSO
ed(1), gencat(1).
Programmer's Guide: Internationalization - Localization.
Page 7 Reliant UNIX 5.44 Printed 11/98