Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ genal(1M) — Reliant UNIX 5.44c4

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

hyphen(1M)

roff-charset(5)

genal(1M)                                                         genal(1M)

NAME
     genal - program for generating the German language exception list

SYNOPSIS
     /usr/bin/genal

DESCRIPTION
     The genal command generates a new exception list for German hyphena-
     tion. A user with the appropriate permissions can edit the exception
     list file using any editor and then make it executable. There are some
     points to watch here though.

   German and English hyphenation

     When formatting texts with the Documenter's Workbench (DWB) it is pos-
     sible to enable language-specific hyphenation depending on whether the
     text is in German or in English. To set the language-specific hyphena-
     tion you use the macro

          .la [language]

     This has the following meaning:

          .la G   German hyphenation

          .la E   English hyphenation.

     The default is German hyphenation, i.e. .la G or simply .la without
     parameters. The language setting is made directly at any position in
     the text file (generally the beginning) or in an include file (.so).
     In the case of "mixed language" documents with both German and English
     text, hyphenation can be switched as needed.

     You should note that the default is for no hyphenation (except in
     words that contain the "-" or "\(em" [minus sign, dash; see
     roff-charset(5)] characters, where the word is split even if hyphena-
     tion is switched off). This means that hyphenation must always be
     switched on explicitly! To do this the macro

          .hy [N]

     is entered directly in the source file or in the macro package being
     used. In other words:

     ⊕  The .la macro does not switch on the .hy macro implicitly.

     ⊕  Hyphenation is only enabled in the text from the point where the
        .hy macro is first called, i. e. if it is called for the first time
        at the end of the file, no hyphenation will take place.






Page 1                       Reliant UNIX 5.44                Printed 11/98

genal(1M)                                                         genal(1M)

     The N parameter of the .hy macro has the following meaning:

          1    Hyphenation with no restrictions

          2    No hyphenation with new page

          4    The last two letters of a word are not truncated

          8    The first two letters of a word are not truncated

          0    Hyphenation is switched off (= .nh)

     The values 2, 4 and 8 can be combined freely, which means, for exam-
     ple, that .hy 14 causes all three restrictions. .hy without parameters
     is identical with .hy 1 (no restriction).

   Controlling hyphenation

     The hyphen(1M) command can be used to check that nroff(1M) hyphenation
     has been carried out properly. This command lists all words with
     hyphens at the line end in a formatted text. If you use hyphen as a
     filter, hyphenation in a file textfile can be controlled with the fol-
     lowing command:

     tbl textfile | neqn | nroff -man -Tlp | col -x | hyphen

   Manipulating hyphenation

     If the hyphenation is incorrect, there are two ways to manipulate it:

     1) directly in the text by marking a mandatory hyphenation point using
        the

     2) by maintaining a (German) exception list (only for system adminis-
        trators)

     To mark a mandatory hyphenation point directly in the text, you need
     to enter the following character string

          \%

     (if another hyphenation point has not already been suggested with the
     .hc c macro).

     The option of manually specifying a hyphenation point in a word is
     used particularly in texts with long words. The word "non-internation-
     alized", for example, would then be edited as follows in the source
     file: non-in\%ter\%na\%tion\%al\%ized.






Page 2                       Reliant UNIX 5.44                Printed 11/98

genal(1M)                                                         genal(1M)

     A mandatory hyphenation point can also be specified with the

          .hw word

     macro. The hyphenation points are marked here with dashes, for example

          .hw dan-de-li-on.

     The word argument is 256 bytes long, and the maximum number of hyphe-
     nation points per word is 20.

     Warning:

     If a possible hyphenation point is given in a word, n/troff will only
     hyphenate at this point (and with the "- "sign). It is safer, there-
     fore, to specify all mandatory hyphenation points (as in the above
     example).

     Hyphenation can be suppressed in two ways: globally or simply for an
     individual word. This is usually done with the

          .nh  or  .hy 0

     macro.

     To suppress hyphenation for a single word, either

     -  a \% is set at the start of the word, or

     -  the special characters "\(hy" (= -), "\(mi" (= -) or "\-" (= -) are
        inserted in a compound [see roff-charset(5)].

     Here are two examples for suppressing hyphenation in a single word:

     \%automount or \%PPP\(hyPackage.

     A more elegant (but also more time-consuming) solution than hyphenat-
     ing directly in the text is the possibility of specifying hyphenation
     points using an exception list. This procedure is worthwhile in the
     case of frequently used words which are systematically hyphenated
     incorrectly. It is worth noting, however, that while such an exception
     list only exists for German hyphenation, English words can of course
     also be entered in the list.

EDITING THE EXCEPTION LIST
     The exception list generally comprises three individual files:
     ausnahme.txt, ausnahme.dat and ausnahme.ind. These files can be found
     in the /usr/lib directory. Entries are only made by the user in the
     ausnahme.txt input file, while the other two index files are generated
     and are not readable. You need superuser rights to make entries in the
     file. The editing of the exception list is somewhat complicated and
     therefore should be described in detail.


Page 3                       Reliant UNIX 5.44                Printed 11/98

genal(1M)                                                         genal(1M)

     The words or parts of words in the exception list are marked twice:

     a) by a code letter (type ID) at the start of the word, which identi-
        fies the entry as a word stem, beginning of a word, word form,
        foreign word etc.

     b) by a digit between 1 and 4 within the entry, which specifies the
        actual hyphenation point

   Code letters in the exception list

     The list of code letters is now given below as well as their meanings.

     a    Normal exception word (word form only), i. e. the word only ever
          appears as an individual word, on its own.

          Examples: wor-un-ter; wor-an (Entry: awor1un1ter; awor1an)

     b    Normal exception word (word form and beginning of word), i. e.
          the word appears on its own as well as at the beginning of com-
          pounds.

          Examples: Zeug-nis, Zeug-nis-ses, Zeug-nis-ar-ten (Entry:
          bzeug1nis)

     c    Normal exception word (beginning of word only), i. e. the word
          only ever appears at the beginning of a compound; in texts, a
          word marked in this way must always be followed another word part
          that contains at least two letters.

          Examples: zi-vi-li-siert, zi-vi-li-sier-te (Entry: czi1vi1li)

     e    Proper noun (the "s" ending can also be appended to the word).

          Examples: Ar-me-ni-en (Entry: ear1me1ni1en)

     f    Foreign word; the word can appear on its own, at the beginning of
          a compound and with the "s" ending.

          Examples: Bo-nus, bo-nus-ar-tig (Entry: fbo1nus)

     g    Foreign word (beginning of word only), i. e. the word is always
          joined with another word part (at least two letters) to form a
          compound.

          Examples: Log-arith-mus, log-arith-misch (Entry: glog1arith)

     k    Customer entry; normal exception word (word form only), i. e. the
          word only ever appears as an individual word, on its own (such as
          a).




Page 4                       Reliant UNIX 5.44                Printed 11/98

genal(1M)                                                         genal(1M)

     v    Verbs with "-en" stem.

          Examples: gra-sen, gra-sen-de, Gras-an-satz (Entry: vgras)

     w    Verbs with "-eln" and "-ern" stem, including the two suffixes
          "el" and "er".

          Examples: bet-teln, Bet-tel, Bett-ler, bet-teln-de (Entry:
          wbet1tel); zau-bern, Zau-be-rer, Zau-ber-kunst (Entry: wzau1ber)

     z    Word form from stem and ending, which is used in compounds.

          Examples: Wis-sens, Wis-sens-art (Entry: zwis1sens)

     All entries in the exception file must be written in lowercase.

   Hyphenation point digits

     The actual hyphenation points are indicated by the following digits:

     1    Normal hyphenation point.

          Example: ain1itia1ti1ve (-> In-itia-ti-ve)

     2    Unaesthetic hyphenation point (should not be used!).

          Unaesthetic hyphenation points includes those where the first or
          last two letters of a word are truncated. This type of hyphena-
          tion must be permitted explicitly using .hy 1 (= unrestricted
          hyphenation). Tests have shown that words in the exception list
          that were marked with 2 for an unaesthetic hyphenation point,
          were not recognized subsequently by the system. If, however, they
          were marked with the digit 1, the hyphenation points were also
          recognized with unrestricted hyphenation as unaesthetic. The
          error can therefore be avoided easily, though the cause of the
          error is unknown.

     3    Hyphenation points when constants are added.

          Example: bschif3fahrt (-> Schiff-fahrt)

     4    ck hyphenation.

          Example: vfac4kel (-> Fak-kel)

   Umlaut handling

     With both code letters and digits being used to mark hyphenation
     points the entries in the exception list can look so strange that the
     user must first become familiar with reading them. In addition, German
     umlauts in the exception file must be entered using special non-alpha-
     betical characters:


Page 5                       Reliant UNIX 5.44                Printed 11/98

genal(1M)                                                         genal(1M)

     { for ä     Example:   d{1cher (= Dä-cher)

     | for ö     Example:   d|r1fer (= Dör-fer)

     } for ü     Example:   f}n1fer (= Fün-fer)

     ~ for ß     Example:   au1~en (= au-ßen)

     At first glance, all of these conventions (code letter, digits,
     umlauts, lowercase) make it difficult to maintain the exception file.
     Moreover, there are also other rules that need to be observed, which
     affect the sort sequence of the entries.

   Sort rules

     It is critical that the entries in the exception list be ordered
     correctly so that it can function correctly. The entries must be in
     the correct alphabetical order, of course, so that the hyphenation
     program can actually find an entry in the exception list. For this
     reason, the basic word is taken without code letter and without hyphe-
     nation point digits. This is the basic rule.

     Entries, in turn, may not only comprise complete words, but also word
     parts (word stem, beginning of word, word form). Contrary to these
     sort conventions, the long form must always come before the short form
     of a word or word part in the exception list. An example here is now
     given: There are four entries for the word part "gewinn" in the excep-
     tion file. These four entries must be ordered as follows:

     bge1win1ner
     zge1win3num1mern
     age1win3num1mer
     vge1winn

     This example clarifies the first two sort rules:

     ⊕  Long form before short form

        "gewinner" before "gewinn" and "gewinnummern" before "gewinnummer"

     ⊕  Alphabetical order

        "gewinner" before "gewinnummern"

     Other sort rules affect the ordering of entries with the alternative
     representations (non-alphabetical characters) of umlauts. You should
     note here that the non-alphabetical characters for umlauts must always
     be placed after the alphabetical characters, and that the actual
     umlauts are ordered alphabetically as normal without regard for their
     alternative representation, so for example ä before ö before ü. Here
     is another example from the exception file:



Page 6                       Reliant UNIX 5.44                Printed 11/98

genal(1M)                                                         genal(1M)

     bge1fl}1gel    (= "geflügel")
     bge1f{ng1nis   (= "gefängnis")
     bge1f{~        (= "gefäß")
     zge1f}hls      (= "gefühls")

     The other sort rules are clarified with this example:

     ⊕  Non-alphabetical characters (umlauts) after alphabetical

        bge1f{ng1nis ["gefängnis"] after bge1fl}1gel ["geflügel"], i.e.
        "f{" after "fl}"

     ⊕  Umlauts in alphabetical order

        "gefängnis" alphabetically before "gefäß"; "gefäß" again alphabeti-
        cally before "gefühls"

     The next example from the exception file shows this again in more
     detail:

     bbu~          (= "buß")
     bb{c4ker      (= "bäcker")
     zb{1ren       (= "bären")
     zb|r1sen      (= "börsen")
     cb|r1sia      (= "börsia")
     cb|s1ar       (= "bösar")
     zb}1cher      (= "bücher")
     bb}1fett      (= "büfett")
     bb}f1fe1lei   (= "büffelei")
     wb}f1fel      (= "büffel")
     wb}1gel       (= "bügel")
     zb}h1nen      (= "bühnen")
     bb}n1de1lei   (= "bündelei")
     bb}n1del      (= "bündel")
     vb}rg         (= "bürg")
     bb}1ro        (= "büro")
     zb}r1sten     (= "bürsten")
     vb}rst        (= "bürst")
     bb}t1tel      (= "büttel")
     zb}t1ten      (= "bütten")
     vb}~          (= "büß")

     This example clarifies the previously mentioned sort rules:

     ⊕  Long form before short form

        "büffelei" before "büffel" etc.

     ⊕  Alphabetical ordering of word parts

        "börsen" before "börsia" etc.



Page 7                       Reliant UNIX 5.44                Printed 11/98

genal(1M)                                                         genal(1M)

     ⊕  Umlauts in alphabetical order

        "bäcker" etc. before "börsen" etc. before "bücher" etc.

     ⊕  Non-alphabetical characters (umlauts) after alphabetical

        bb{c4ker ("bäcker") after bbu~ ("buß") and not vice versa; vb}~
        ("büß") after zb}t1ten ("bütten") and not vice versa.

     A final sort rule can be derived from the preceding examples, with
     regard to non-alphabetical characters for umlauts. Since the non-
     alphabetical characters must always come after the alphabetical char-
     acters, entries of words or word parts that begin with an umlaut are
     also always ordered after the alphabetical entries (i.e. after the
     letter "z"). Here is a final example:

     bzy1lin1der   (= "zylinder")
     c{gyp         (= "ägyp")
     c{ngst        (= "ängst")
     b|l1ofen      (= "ölofen")
     b}ber1ein     (= "überein")
     b}ber1haupt   (= "überhaupt")

     This example clarifies the following sort rules:

     ⊕  Umlauts in alphabetical order

        "ägyp" etc. before "ölofen" etc. before "überein" etc.

     ⊕  Non-alphabetical characters at the start of the entry after the
        letter "z"

        c{gyp ("ägyp") etc. after bzy1lin1der ("zylinder")

     Aside from the rules for the correct ordering of entries, there are
     other points to look out for, which do not, however, interfere with
     the functioning of the exception list directly.

   Notes

     The following note applies in general:

     -  If certain words or word parts are entered in the exception list,
        this does not necessarily mean that the program would hyphenate
        them incorrectly without this entry. An exception may also be
        necessary in the list in order to hyphenate compounds with these
        entries correctly.

     -  The hyphenation program works on the basis of grammatical rules and
        probabilities, with the result that 100% accuracy cannot be
        achieved. Entries in the exception list have priority over the
        rules of the hyphenation algorithm.


Page 8                       Reliant UNIX 5.44                Printed 11/98

genal(1M)                                                         genal(1M)

     Another note relates to possible error sources for the hyphenation
     program:

     -  Apart from foreign words, frequent sources of errors in German are
        compounds, although the hyphenation program is designed to deal
        with this. German words with so-called linking letters ("Fugen-
        Zeichen") are particularly problematic. Some examples of words
        hyphenated incorrectly if the exception list is not used are given
        below.

        Linking-s     Example: Abschnittsanzahl

                      Hyphenation error without exception list:
                      Ab-schnitt-san-zahl

        Linking-e     Example: Schweinefleisch

                      Hyphenation error without exception list:
                      Schwei-nef-leisch

        Linking -en   Example: Gruppenarbeiten

                      Hyphenation error without exception list:
                      Grup-pe-nar-bei-ten

        Compounds     Example: dialogorientiert

                      Hyphenation error without exception list:
                      dia-lo-go-rien-tiert

     Yet another point to look at are special features of German hyphena-
     tion.

     -  Words that are not hyphenated in accordance with the other rules of
        German hyphenation must also be entered in the exception list. In
        particular, this affects the rule in German where "st" is not
        split. Exceptions to this rule, such as the words "Dienstag"
        (hyphenated as: Diens-tag), "Bistum" (hyphenated as: Bis-tum) or
        "Ostern" (not hyphenated) must be entered in the exception list.

        Warning:

        If a word is not to be hyphenated, it should be entered as follows:
        e. g. bostern (for "Ostern") or bangst (for "Angst", "Angsthase"
        etc.) or fguide (for the English foreign word "guide"). A digit is
        thus not assigned for the hyphenation point, not even at the end of
        the word where a digit is generally not specified.







Page 9                       Reliant UNIX 5.44                Printed 11/98

genal(1M)                                                         genal(1M)

     While it is possible to extend the number of entries in the exception
     list freely, it is better if it is restricted to a certain size.

     -  Rather than enter every word that has been hyphenated incorrectly
        in the exception list, it is worth considering if a word part would
        be sufficient. Thus, for example, the incorrectly hyphenated word
        "Abschnittsanzahl" could be entered as aab1schnitts1an1zahl. Even
        better, however, would be to enter it simply as the word part
        zschnitts since this covers various compounds at the same time, for
        example "Durchschnittsanalyse", "Querschnittsermittlung" etc. The
        code letters are provided in the exception file in order to support
        this effort at limiting the size.

     The next note relates to the representation of umlauts in troff.

     -  Alternative umlaut representations in the source file, as are fami-
        liar to troff(1M) (\(^a = ä, \(^o = ö etc.), cannot be considered
        in this form in the exception list. Words with the troff alterna-
        tive representation in the source code can only be split correctly
        if the umlaut is entered directly at the keyboard. If the split is
        still incorrect, the user has to specify the hyphenation point with
        \%, thus for example: \(^uber\%haupt or .hw Ka-pi-tel-über-schrift.

        Problems with hyphenation can always be anticipated if symbols such
        as ae, oe, ue etc. are used in the text instead of the real
        umlauts.

GENERATING THE NEW EXCEPTION LIST
     After the user has made the new entries in the input file
     ausnahme.txt, a new, executable exception list must be generated. To
     do this use the command:

          genal (without parameters)

     This command reads the entries in the input file in succession, and
     creates a new executable exception list. The two files that make up
     the exception list are overwritten here (ausnahme.dat and
     ausnahme.ind). The command must be called in the directory where the
     input file and exception list are located.

     During the formatting run, n/troff first searches for the exception
     list in the current directory and then under /usr/lib, if it is not
     available there.

PROBLEMS DESPITE THE NEW EXCEPTION LIST
     After generating the new exception list, a small routine is available
     to check that words that were hyphenated incorrectly are now hyphen-
     ated correctly. The examples given above of incorrectly hyphenated
     words would have to be written here to a test file with the following
     contents:




Page 10                      Reliant UNIX 5.44                Printed 11/98

genal(1M)                                                         genal(1M)

     .hy 1             \" unrestricted hyphenation
     .ll 2n            \" shortened line length
     Abschnittsanzahl
                       \" real blank lines, no nofill mode!
     Schweinefleisch

     Gruppenarbeiten

     dialogorientiert

     The command is simply as follows: nroff testfile | hyphen. If the
     entries have been recorded correctly in the exception list, the
     correct hyphenations can now be viewed on screen.

     If the required result is not given however, and a word is still
     hyphenated incorrectly, the entry must be checked in the ausnahme.txt
     input file. From time to time, an error occurs where a non-applicable
     code letter is selected; more frequently, however, the sort sequence
     of the entries has not been observed strictly enough, with the result
     that the otherwise correct entry could not actually be found. The pro-
     cedure must, in any case, be repeated until the result is correct.

     In terms of hyphenation being correct, it is irrelevant whether nroff
     or troff formatting was used. Semantic conflicts may arise when
     hyphenating some German words, that also cannot be intercepted with
     the exception list. It depends on the context then whether, for exam-
     ple, the word "Baumast" is hyphenated as Baum-ast or as Bau-mast. In
     such a case, the user needs to insert an \% directly to specify what
     is meant: i.e. Baum\%ast or Bau\%mast.

SPELLING REFORM
     The German hyphenation algorithm works in line with the hyphenation
     rules that are currently valid. The character string "ck" in a word is
     replaced with "k-k" when hyphenated (e. g. Druk-ker) or a third con-
     sonant must be inserted (e. g. Schiffahrt -> Schiff-fahrt).

FILES
     /usr/lib/ausnahme.txt
          Editable exception list

     /usr/lib/ausnahme.dat
          Non-readable index file

     /usr/lib/ausnahme.ind
          Non-readable index file

SEE ALSO
     hyphen(1M), roff-charset(5).






Page 11                      Reliant UNIX 5.44                Printed 11/98

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026