catexstr(1) DG/UX R4.11MU05 catexstr(1)
NAME
catexstr - extract strings from source files, replace with catgets
calls
SYNOPSIS
catexstr [-llang] [-ccat] [-bbeg] [-eend] file ... > strings
catexstr -r [-llang] [-ccat] [-bbeg] [-eend] file < strings >
file.new
DESCRIPTION
The catexstr utility is used to extract strings from source files and
replace them with calls to the X-Open-style message retrieval
function or command (see catgets(1,3C)), and generate a message
catalog (.msg file) that contains the messages. The .msg file can
then be translated into other natural languages. The source files
may contain C language source, or source code in other languages,
such as shell scripts.
Catexstr has the following options:
-r Runs pass two of catexstr (replace mode), generating a new
version of the source file on the standard output, and
simultaneously generating a message catalog (.msg file).
-llang Specifies the source code language of the file(s) being
manipulated. The choices that are recognized are c, sh
(shell script), and gen (generic). The -l option establishes
values to be used as the format of the string and the name of
the catalog to be inserted into the new source file, and the
strings that will be recognized as the beginning and end of
comments. These may be overridden with the other options
listed here.
-ffmt Specifies the format string to be used when creating the
modified version of the source code file. The default
formats for various languages are shown below.
-ccat Specifies the catalog name used when creating the modified
version of the source code file. This name is inserted into
the source code file; it is not used as the name of the .msg
file to be created.
-bbeg Specifies the string to be treated as the beginning of a
comment.
-eend Specifies the string to be treated as the end of a comment.
This may be one or two bytes long. Nesting of comments is
not recognized.
If none of -l, -f, -c, -b, or -e are specified, then -lc is assumed
(for compatibility with earlier versions of catexstr). If a source
code language is specified with -l, then the default values
associated with that language (shown below) are assumed. These
defaults may be overridden with the other options described above.
If -l is not used, but one of -f, -c, -b, or -e are, then -lgen is
assumed. The default values for each of the supported languages are:
Lang Format string catalog comment comment
name begin end
---------------------------------------------------------------
c catgets(%s, %d, %d, "%s") catd /* */
sh `catgets %s %d %d "%s"` * # \\n
gen catgets %s %d %d "%s" * none \\n
The parameters passed to sprintf in conjunction with the format
strings are, respectively:
the catalog name, as specified here or with the -c option;
the message set number;
the message number; and
the message text.
* For languages sh and gen, the default catalog name is the name of
the source file (with any existing extension stripped off), and .cat
appended.
In pass one (without the -r option), catexstr extracts a list of
strings from the named source files, with positional information.
This list is produced on standard output in the following format:
file:line:position:length:setnum:msgnum:"string"
file the name of the source file
line line number in the file
position character position in the line
length length of the original string
setnum null
msgnum null
string the extracted, modified text string, surrounded
by double quotes.
Normally you would redirect this output into a file (the "message
list file", shown as strings on the command line above). Then you
would edit this file as described below. Then you would use catexstr
-r to generate a new version of the source file, and a message (.msg)
file.
Any '%' characters in the source file that are not part of a "%%"
pair will be translated into "%nn$" sequences in the message list
file, where the "nn" numbers enumerate the uses of '%' in the
message. For example, the message
"File %s has %d blocks."
would become
"File %1$s has %2$d blocks."
This allows the human translator to modify the order of the '%'
tokens in the message to accommodate the syntax requirements of the
target natural language, while still accommodating the order of the
parameters to the printf call. If the message has only one
occurrence of '%', then this modification is not really necessary,
but it is done anyway.
Next, examine this list and determine which messages can be
translated and subsequently retrieved by catgets. Modify this
message list file by deleting lines that can't be translated. In
particular, text associated with '#include "filename"' lines must be
deleted, and '#define foo "bar"' lines must be scrutinized.
If you wish to specify the set number(s) and message number(s) to use
(see gencat(1)), you may do so by inserting these numbers into the
fifth (setnum) and sixth (msgnum) fields in the message list file.
If you do not specify the set number to use for a particular message,
set number one is used, unless some other set has been specified for
an earlier message, in which case that set number is used. If you do
not specify any message numbers, the messages are numbered
sequentially, starting with number one. If any message is explicitly
numbered, that number is used for that message, and automatic
numbering resumes from that number.
You are free to modify the text of the message in the message list
file in any other way that you consider appropriate. For example,
you might use this occasion to clarify an ambiguous English sentence.
Make sure that the text is enclosed in double quotes ("). Do not
modify any of the first four fields on these lines, even if you
change the length of the message.
The message list file should not be translated into any other natural
language. The file to translate into other languages is the message
file (.msg file) that will be produced by the second pass of
catexstr.
Note, however that you must not make any modifications to the source
file between running the first and second passes of catexstr.
After editing the message list file, use this modified message list
file as input to catexstr -r file. You should provide the same set
of options (except -r) to this second pass of catexstr that you gave
to the first pass. The second pass of catexstr will produce a new
version of the original source file, in which the messages have been
replaced by calls to the message retrieval function or command
catgets. At the same time, a message file that is of the correct
format to be used as input to gencat is generated, with the name
file.msg.
If you are manipulating C source code, then once the new version of
the .c file has been created, you must edit it to include a
declaration for the catalog descriptor variable (normally catd) as
type nlcatd. This variable is used in the calls to catgets (see
catgets(3C)). Usually, you would declare one catd variable and use
it throughout the program. Also, you must add a call to catopen.
Generally this is at the top of the main routine (see catopen(3C)).
You may also wish to add a call to catclose. The program must also
call setlocale (see setlocale(3C)) if it does not do so already.
This will probably entail inclusion of locale.h.
The catexstr program cannot correctly replace strings in all
instances. For example, a static character string initialization
cannot be replaced by a call to catexstr. A second example is an
escape sequence which should not be translated. In some cases the C
code may require modification so that strings can be extracted and
replaced by calls to the message retrieval function.
Shell Scripts
Shell scripts present a variety of challenges. Here are a few
pointers in dealing with them.
Before running the first pass of catexstr, examine the shell script
for back-quote (`) characters within double-quoted strings (strings
enclosed in double-quote marks (")). Such occurrences will not be
handled correctly by catexstr, and must be modified either before or
after running catexstr.
Also look for strings that should be translated, that are not
enclosed in double quotes. This includes strings enclosed in single
quotes (').
Similarly, look for strings that must be passed as a single argument
to a command, rather than being broken into separate arguments
(words) by the shell. Such cases can be handled by assigning the
value of the string to a temporary shell variable, and then using the
shell variable in the call to the command. For example,
log_error "This must be one argument, not seven."
becomes
msg = "This must be one argument, not seven."
log_error "$msg"
which ends up looking something like:
msg = `catexstr mycat.cat 1 15 \
"This must be one argument, not seven."`
log_error "$msg"
After running the first pass of catexstr, search the message list
file for any occurrence of a back-quote character. Any such
occurrence, as mentioned above, must be changed. This may be done by
either modifying the original source and re-running the first pass of
catexstr, or by modifying the new source file after running the
second pass of catexstr.
After running both passes of catexstr, edit the new source file and
examine each call to catgets, to make sure that it makes sense. One
particular optimization that can frequently be made is, for example,
to change
echo `catgets mycat.cat 1 16 "Hello, world."`
to
catgets mycat.cat 1 16 "Hello, world."
EXAMPLES
The following examples show uses of catexstr to convert a C program.
Assume that the file hw.c contains:
main()
{
printf("This is an example\n");
printf("Hello world!\n");
printf("This is the %s string (number %d)\n", "third", 3);
}
catexstr hw.c > hw.strings produces the following output in the file
hw.strings:
hw.c:3:8:20:::"This is an example\n"
hw.c:4:8:14:::"Hello world!\n"
hw.c:5:8:35:::"This is the %1$s string (number %2$d)\n"
hw.c:5:47:5:::"third"
The file hw.strings can be edited as described above.
The catexstr utility can now be invoked with the -r option to replace
the strings in the source file by calls to the message retrieval
function catgets().
catexstr -r hw.c <hw.strings >hw.new.c produces the following output
(the indentation has been modified to fit on this manual page):
#include <nltypes.h>
main()
{
printf(catgets(catd, 1, 1, "This is an example\n"));
printf(catgets(catd, 1, 2, "Hello world!\n"));
printf(catgets(catd, 1, 3, "This is the %1$s string (number %2$d)\n"), \
catgets(catd, 1, 4, "third"), 3);
}
This new source file must be edited to include a declaration of catd
(as type nl_catd), a call to catopen, and possibly calls to setlocale
and catclose. You may also wish to break the long line:
#include <nltypes.h>
#include <locale.h>
static nlcatd catd;
main()
{
(void) setlocale (LCALL, "");
catd = catopen ("hw.cat", 0);
printf(catgets(catd, 1, 1, "This is an example\n"));
printf(catgets(catd, 1, 2, "Hello world!\n"));
printf(catgets(catd, 1, 3, "This is the %1$s string (number %2$d)\n"),
catgets(catd, 1, 4, "third"), 3);
catclose (catd);
}
The catexstr -r command above also produces a message file, hw.msg:
$quote "
$set 1
1 "This is an example\n"
2 "Hello world!\n"
3 "This is the %1$s string (number %2$d)\n"
4 "third"
This message file may be replicated and translated into other natural
languages.
The following command is used to compile the message catalog:
rm hw.cat; gencat hw.cat hw.msg
The resulting message catalog (hw.cat) must be installed in the
appropriate directory. Normally, this would be a subdirectory of
/usr/lib/nls/msg.
Multiple Source Files
Programs that consist of more than one source file should be handled
as follows. First, catexstr is called with all the source files as
arguments:
catexstr foo1.c foo2.c > foo.strings
Second, the message list file (foo.strings) is edited as described
above.
Third, catexstr -r is called once for each source file, to create new
source files and message (.msg) files:
catexstr -r foo1.c < foo.strings > foo1.new.c
catexstr -r foo2.c < foo.strings > foo2.new.c
Fourth, gencat is called to compile the message catalog:
rm -f foo.cat
gencat foo.cat foo1.msg foo2.msg
FILES
/usr/lib/nls/msg/locale/catalog.cat
files created by gencat(1)
ENVIRONMENT VARIABLES
NLSPATH specification of directory containing the locale-specific
message catalog directories.
LANG locale name.
DIAGNOSTICS
The error messages produced by catexstr are intended to be self-
explanatory. They indicate errors in the command line or format
errors encountered within the input file.
SEE ALSO
catgets(1), gencat(1),
catopen(3C), catclose(3C), catgets(3C), printf(3S), setlocale(3C).
environ(5).
exstr(1) -- AT&T-style message facility.
Licensed material--property of copyright holder(s)