genal(1M) genal(1M)
NAME
genal - program for generating the German language exception list
SYNOPSIS
/usr/bin/genal
DESCRIPTION
The genal command generates a new exception list for German hyphena-
tion. A user with the appropriate permissions can edit the exception
list file using any editor and then make it executable. There are some
points to watch here though.
German and English hyphenation
When formatting texts with the Documenter's Workbench (DWB) it is pos-
sible to enable language-specific hyphenation depending on whether the
text is in German or in English. To set the language-specific hyphena-
tion you use the macro
.la [language]
This has the following meaning:
.la G German hyphenation
.la E English hyphenation.
The default is German hyphenation, i.e. .la G or simply .la without
parameters. The language setting is made directly at any position in
the text file (generally the beginning) or in an include file (.so).
In the case of "mixed language" documents with both German and English
text, hyphenation can be switched as needed.
You should note that the default is for no hyphenation (except in
words that contain the "-" or "\(em" [minus sign, dash; see
roff-charset(5)] characters, where the word is split even if hyphena-
tion is switched off). This means that hyphenation must always be
switched on explicitly! To do this the macro
.hy [N]
is entered directly in the source file or in the macro package being
used. In other words:
⊕ The .la macro does not switch on the .hy macro implicitly.
⊕ Hyphenation is only enabled in the text from the point where the
.hy macro is first called, i. e. if it is called for the first time
at the end of the file, no hyphenation will take place.
Page 1 Reliant UNIX 5.44 Printed 11/98
genal(1M) genal(1M)
The N parameter of the .hy macro has the following meaning:
1 Hyphenation with no restrictions
2 No hyphenation with new page
4 The last two letters of a word are not truncated
8 The first two letters of a word are not truncated
0 Hyphenation is switched off (= .nh)
The values 2, 4 and 8 can be combined freely, which means, for exam-
ple, that .hy 14 causes all three restrictions. .hy without parameters
is identical with .hy 1 (no restriction).
Controlling hyphenation
The hyphen(1M) command can be used to check that nroff(1M) hyphenation
has been carried out properly. This command lists all words with
hyphens at the line end in a formatted text. If you use hyphen as a
filter, hyphenation in a file textfile can be controlled with the fol-
lowing command:
tbl textfile | neqn | nroff -man -Tlp | col -x | hyphen
Manipulating hyphenation
If the hyphenation is incorrect, there are two ways to manipulate it:
1) directly in the text by marking a mandatory hyphenation point using
the
2) by maintaining a (German) exception list (only for system adminis-
trators)
To mark a mandatory hyphenation point directly in the text, you need
to enter the following character string
\%
(if another hyphenation point has not already been suggested with the
.hc c macro).
The option of manually specifying a hyphenation point in a word is
used particularly in texts with long words. The word "non-internation-
alized", for example, would then be edited as follows in the source
file: non-in\%ter\%na\%tion\%al\%ized.
Page 2 Reliant UNIX 5.44 Printed 11/98
genal(1M) genal(1M)
A mandatory hyphenation point can also be specified with the
.hw word
macro. The hyphenation points are marked here with dashes, for example
.hw dan-de-li-on.
The word argument is 256 bytes long, and the maximum number of hyphe-
nation points per word is 20.
Warning:
If a possible hyphenation point is given in a word, n/troff will only
hyphenate at this point (and with the "- "sign). It is safer, there-
fore, to specify all mandatory hyphenation points (as in the above
example).
Hyphenation can be suppressed in two ways: globally or simply for an
individual word. This is usually done with the
.nh or .hy 0
macro.
To suppress hyphenation for a single word, either
- a \% is set at the start of the word, or
- the special characters "\(hy" (= -), "\(mi" (= -) or "\-" (= -) are
inserted in a compound [see roff-charset(5)].
Here are two examples for suppressing hyphenation in a single word:
\%automount or \%PPP\(hyPackage.
A more elegant (but also more time-consuming) solution than hyphenat-
ing directly in the text is the possibility of specifying hyphenation
points using an exception list. This procedure is worthwhile in the
case of frequently used words which are systematically hyphenated
incorrectly. It is worth noting, however, that while such an exception
list only exists for German hyphenation, English words can of course
also be entered in the list.
EDITING THE EXCEPTION LIST
The exception list generally comprises three individual files:
ausnahme.txt, ausnahme.dat and ausnahme.ind. These files can be found
in the /usr/lib directory. Entries are only made by the user in the
ausnahme.txt input file, while the other two index files are generated
and are not readable. You need superuser rights to make entries in the
file. The editing of the exception list is somewhat complicated and
therefore should be described in detail.
Page 3 Reliant UNIX 5.44 Printed 11/98
genal(1M) genal(1M)
The words or parts of words in the exception list are marked twice:
a) by a code letter (type ID) at the start of the word, which identi-
fies the entry as a word stem, beginning of a word, word form,
foreign word etc.
b) by a digit between 1 and 4 within the entry, which specifies the
actual hyphenation point
Code letters in the exception list
The list of code letters is now given below as well as their meanings.
a Normal exception word (word form only), i. e. the word only ever
appears as an individual word, on its own.
Examples: wor-un-ter; wor-an (Entry: awor1un1ter; awor1an)
b Normal exception word (word form and beginning of word), i. e.
the word appears on its own as well as at the beginning of com-
pounds.
Examples: Zeug-nis, Zeug-nis-ses, Zeug-nis-ar-ten (Entry:
bzeug1nis)
c Normal exception word (beginning of word only), i. e. the word
only ever appears at the beginning of a compound; in texts, a
word marked in this way must always be followed another word part
that contains at least two letters.
Examples: zi-vi-li-siert, zi-vi-li-sier-te (Entry: czi1vi1li)
e Proper noun (the "s" ending can also be appended to the word).
Examples: Ar-me-ni-en (Entry: ear1me1ni1en)
f Foreign word; the word can appear on its own, at the beginning of
a compound and with the "s" ending.
Examples: Bo-nus, bo-nus-ar-tig (Entry: fbo1nus)
g Foreign word (beginning of word only), i. e. the word is always
joined with another word part (at least two letters) to form a
compound.
Examples: Log-arith-mus, log-arith-misch (Entry: glog1arith)
k Customer entry; normal exception word (word form only), i. e. the
word only ever appears as an individual word, on its own (such as
a).
Page 4 Reliant UNIX 5.44 Printed 11/98
genal(1M) genal(1M)
v Verbs with "-en" stem.
Examples: gra-sen, gra-sen-de, Gras-an-satz (Entry: vgras)
w Verbs with "-eln" and "-ern" stem, including the two suffixes
"el" and "er".
Examples: bet-teln, Bet-tel, Bett-ler, bet-teln-de (Entry:
wbet1tel); zau-bern, Zau-be-rer, Zau-ber-kunst (Entry: wzau1ber)
z Word form from stem and ending, which is used in compounds.
Examples: Wis-sens, Wis-sens-art (Entry: zwis1sens)
All entries in the exception file must be written in lowercase.
Hyphenation point digits
The actual hyphenation points are indicated by the following digits:
1 Normal hyphenation point.
Example: ain1itia1ti1ve (-> In-itia-ti-ve)
2 Unaesthetic hyphenation point (should not be used!).
Unaesthetic hyphenation points includes those where the first or
last two letters of a word are truncated. This type of hyphena-
tion must be permitted explicitly using .hy 1 (= unrestricted
hyphenation). Tests have shown that words in the exception list
that were marked with 2 for an unaesthetic hyphenation point,
were not recognized subsequently by the system. If, however, they
were marked with the digit 1, the hyphenation points were also
recognized with unrestricted hyphenation as unaesthetic. The
error can therefore be avoided easily, though the cause of the
error is unknown.
3 Hyphenation points when constants are added.
Example: bschif3fahrt (-> Schiff-fahrt)
4 ck hyphenation.
Example: vfac4kel (-> Fak-kel)
Umlaut handling
With both code letters and digits being used to mark hyphenation
points the entries in the exception list can look so strange that the
user must first become familiar with reading them. In addition, German
umlauts in the exception file must be entered using special non-alpha-
betical characters:
Page 5 Reliant UNIX 5.44 Printed 11/98
genal(1M) genal(1M)
{ for ä Example: d{1cher (= Dä-cher)
| for ö Example: d|r1fer (= Dör-fer)
} for ü Example: f}n1fer (= Fün-fer)
~ for ß Example: au1~en (= au-ßen)
At first glance, all of these conventions (code letter, digits,
umlauts, lowercase) make it difficult to maintain the exception file.
Moreover, there are also other rules that need to be observed, which
affect the sort sequence of the entries.
Sort rules
It is critical that the entries in the exception list be ordered
correctly so that it can function correctly. The entries must be in
the correct alphabetical order, of course, so that the hyphenation
program can actually find an entry in the exception list. For this
reason, the basic word is taken without code letter and without hyphe-
nation point digits. This is the basic rule.
Entries, in turn, may not only comprise complete words, but also word
parts (word stem, beginning of word, word form). Contrary to these
sort conventions, the long form must always come before the short form
of a word or word part in the exception list. An example here is now
given: There are four entries for the word part "gewinn" in the excep-
tion file. These four entries must be ordered as follows:
bge1win1ner
zge1win3num1mern
age1win3num1mer
vge1winn
This example clarifies the first two sort rules:
⊕ Long form before short form
"gewinner" before "gewinn" and "gewinnummern" before "gewinnummer"
⊕ Alphabetical order
"gewinner" before "gewinnummern"
Other sort rules affect the ordering of entries with the alternative
representations (non-alphabetical characters) of umlauts. You should
note here that the non-alphabetical characters for umlauts must always
be placed after the alphabetical characters, and that the actual
umlauts are ordered alphabetically as normal without regard for their
alternative representation, so for example ä before ö before ü. Here
is another example from the exception file:
Page 6 Reliant UNIX 5.44 Printed 11/98
genal(1M) genal(1M)
bge1fl}1gel (= "geflügel")
bge1f{ng1nis (= "gefängnis")
bge1f{~ (= "gefäß")
zge1f}hls (= "gefühls")
The other sort rules are clarified with this example:
⊕ Non-alphabetical characters (umlauts) after alphabetical
bge1f{ng1nis ["gefängnis"] after bge1fl}1gel ["geflügel"], i.e.
"f{" after "fl}"
⊕ Umlauts in alphabetical order
"gefängnis" alphabetically before "gefäß"; "gefäß" again alphabeti-
cally before "gefühls"
The next example from the exception file shows this again in more
detail:
bbu~ (= "buß")
bb{c4ker (= "bäcker")
zb{1ren (= "bären")
zb|r1sen (= "börsen")
cb|r1sia (= "börsia")
cb|s1ar (= "bösar")
zb}1cher (= "bücher")
bb}1fett (= "büfett")
bb}f1fe1lei (= "büffelei")
wb}f1fel (= "büffel")
wb}1gel (= "bügel")
zb}h1nen (= "bühnen")
bb}n1de1lei (= "bündelei")
bb}n1del (= "bündel")
vb}rg (= "bürg")
bb}1ro (= "büro")
zb}r1sten (= "bürsten")
vb}rst (= "bürst")
bb}t1tel (= "büttel")
zb}t1ten (= "bütten")
vb}~ (= "büß")
This example clarifies the previously mentioned sort rules:
⊕ Long form before short form
"büffelei" before "büffel" etc.
⊕ Alphabetical ordering of word parts
"börsen" before "börsia" etc.
Page 7 Reliant UNIX 5.44 Printed 11/98
genal(1M) genal(1M)
⊕ Umlauts in alphabetical order
"bäcker" etc. before "börsen" etc. before "bücher" etc.
⊕ Non-alphabetical characters (umlauts) after alphabetical
bb{c4ker ("bäcker") after bbu~ ("buß") and not vice versa; vb}~
("büß") after zb}t1ten ("bütten") and not vice versa.
A final sort rule can be derived from the preceding examples, with
regard to non-alphabetical characters for umlauts. Since the non-
alphabetical characters must always come after the alphabetical char-
acters, entries of words or word parts that begin with an umlaut are
also always ordered after the alphabetical entries (i.e. after the
letter "z"). Here is a final example:
bzy1lin1der (= "zylinder")
c{gyp (= "ägyp")
c{ngst (= "ängst")
b|l1ofen (= "ölofen")
b}ber1ein (= "überein")
b}ber1haupt (= "überhaupt")
This example clarifies the following sort rules:
⊕ Umlauts in alphabetical order
"ägyp" etc. before "ölofen" etc. before "überein" etc.
⊕ Non-alphabetical characters at the start of the entry after the
letter "z"
c{gyp ("ägyp") etc. after bzy1lin1der ("zylinder")
Aside from the rules for the correct ordering of entries, there are
other points to look out for, which do not, however, interfere with
the functioning of the exception list directly.
Notes
The following note applies in general:
- If certain words or word parts are entered in the exception list,
this does not necessarily mean that the program would hyphenate
them incorrectly without this entry. An exception may also be
necessary in the list in order to hyphenate compounds with these
entries correctly.
- The hyphenation program works on the basis of grammatical rules and
probabilities, with the result that 100% accuracy cannot be
achieved. Entries in the exception list have priority over the
rules of the hyphenation algorithm.
Page 8 Reliant UNIX 5.44 Printed 11/98
genal(1M) genal(1M)
Another note relates to possible error sources for the hyphenation
program:
- Apart from foreign words, frequent sources of errors in German are
compounds, although the hyphenation program is designed to deal
with this. German words with so-called linking letters ("Fugen-
Zeichen") are particularly problematic. Some examples of words
hyphenated incorrectly if the exception list is not used are given
below.
Linking-s Example: Abschnittsanzahl
Hyphenation error without exception list:
Ab-schnitt-san-zahl
Linking-e Example: Schweinefleisch
Hyphenation error without exception list:
Schwei-nef-leisch
Linking -en Example: Gruppenarbeiten
Hyphenation error without exception list:
Grup-pe-nar-bei-ten
Compounds Example: dialogorientiert
Hyphenation error without exception list:
dia-lo-go-rien-tiert
Yet another point to look at are special features of German hyphena-
tion.
- Words that are not hyphenated in accordance with the other rules of
German hyphenation must also be entered in the exception list. In
particular, this affects the rule in German where "st" is not
split. Exceptions to this rule, such as the words "Dienstag"
(hyphenated as: Diens-tag), "Bistum" (hyphenated as: Bis-tum) or
"Ostern" (not hyphenated) must be entered in the exception list.
Warning:
If a word is not to be hyphenated, it should be entered as follows:
e. g. bostern (for "Ostern") or bangst (for "Angst", "Angsthase"
etc.) or fguide (for the English foreign word "guide"). A digit is
thus not assigned for the hyphenation point, not even at the end of
the word where a digit is generally not specified.
Page 9 Reliant UNIX 5.44 Printed 11/98
genal(1M) genal(1M)
While it is possible to extend the number of entries in the exception
list freely, it is better if it is restricted to a certain size.
- Rather than enter every word that has been hyphenated incorrectly
in the exception list, it is worth considering if a word part would
be sufficient. Thus, for example, the incorrectly hyphenated word
"Abschnittsanzahl" could be entered as aab1schnitts1an1zahl. Even
better, however, would be to enter it simply as the word part
zschnitts since this covers various compounds at the same time, for
example "Durchschnittsanalyse", "Querschnittsermittlung" etc. The
code letters are provided in the exception file in order to support
this effort at limiting the size.
The next note relates to the representation of umlauts in troff.
- Alternative umlaut representations in the source file, as are fami-
liar to troff(1M) (\(^a = ä, \(^o = ö etc.), cannot be considered
in this form in the exception list. Words with the troff alterna-
tive representation in the source code can only be split correctly
if the umlaut is entered directly at the keyboard. If the split is
still incorrect, the user has to specify the hyphenation point with
\%, thus for example: \(^uber\%haupt or .hw Ka-pi-tel-über-schrift.
Problems with hyphenation can always be anticipated if symbols such
as ae, oe, ue etc. are used in the text instead of the real
umlauts.
GENERATING THE NEW EXCEPTION LIST
After the user has made the new entries in the input file
ausnahme.txt, a new, executable exception list must be generated. To
do this use the command:
genal (without parameters)
This command reads the entries in the input file in succession, and
creates a new executable exception list. The two files that make up
the exception list are overwritten here (ausnahme.dat and
ausnahme.ind). The command must be called in the directory where the
input file and exception list are located.
During the formatting run, n/troff first searches for the exception
list in the current directory and then under /usr/lib, if it is not
available there.
PROBLEMS DESPITE THE NEW EXCEPTION LIST
After generating the new exception list, a small routine is available
to check that words that were hyphenated incorrectly are now hyphen-
ated correctly. The examples given above of incorrectly hyphenated
words would have to be written here to a test file with the following
contents:
Page 10 Reliant UNIX 5.44 Printed 11/98
genal(1M) genal(1M)
.hy 1 \" unrestricted hyphenation
.ll 2n \" shortened line length
Abschnittsanzahl
\" real blank lines, no nofill mode!
Schweinefleisch
Gruppenarbeiten
dialogorientiert
The command is simply as follows: nroff testfile | hyphen. If the
entries have been recorded correctly in the exception list, the
correct hyphenations can now be viewed on screen.
If the required result is not given however, and a word is still
hyphenated incorrectly, the entry must be checked in the ausnahme.txt
input file. From time to time, an error occurs where a non-applicable
code letter is selected; more frequently, however, the sort sequence
of the entries has not been observed strictly enough, with the result
that the otherwise correct entry could not actually be found. The pro-
cedure must, in any case, be repeated until the result is correct.
In terms of hyphenation being correct, it is irrelevant whether nroff
or troff formatting was used. Semantic conflicts may arise when
hyphenating some German words, that also cannot be intercepted with
the exception list. It depends on the context then whether, for exam-
ple, the word "Baumast" is hyphenated as Baum-ast or as Bau-mast. In
such a case, the user needs to insert an \% directly to specify what
is meant: i.e. Baum\%ast or Bau\%mast.
SPELLING REFORM
The German hyphenation algorithm works in line with the hyphenation
rules that are currently valid. The character string "ck" in a word is
replaced with "k-k" when hyphenated (e. g. Druk-ker) or a third con-
sonant must be inserted (e. g. Schiffahrt -> Schiff-fahrt).
FILES
/usr/lib/ausnahme.txt
Editable exception list
/usr/lib/ausnahme.dat
Non-readable index file
/usr/lib/ausnahme.ind
Non-readable index file
SEE ALSO
hyphen(1M), roff-charset(5).
Page 11 Reliant UNIX 5.44 Printed 11/98