Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ pword(1) — NEXTSTEP 1.0

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

wc(1)

spell(1)

hist(1)

diction(1)

ixBuild(1)

pword(1)  —  UNIX Programmer’s Manual

NAME

pword − print the peculiar words in a document

SYNOPSIS

pword [-cafpwS][-n #][-m #][-t #][-P #][-l wftable][-s #][-i #][files ...]
 

DESCRIPTION

 
pword scans the text in files, or presented on the standard input, and attempts to measure the peculiarity of the words it encounters.  It can be used as a filter for identifying keywords or for identifying the topically intense areas of a document. 
 
Each word is assigned an index of peculiarity: the larger the number, the more unusual the word.  This index is the normalized ratio of the frequency of the word within the source text to the frequency of the word in a base domain, usually common English.  word/weight pairs are printed on the standard output, one pair per line, in descending order of peculiarity. 
 
If the environment variable PWORD exists, it is parsed before the command line arguments.  Note that ixBuild(1) has a command line flag to set this value. 
 

OPTIONS

−c Breaks words at contractions and hyphenations. 

−a Prints the absolute count of each word, instead of its index of peculiarity. 

−p Disables folding of plurals to singular form. 

−f Prints source-relative frequency of each word, rather than its index of peculiarity. 

−w Prints weight word rather than word weight. 

−S Disables use of the standard stop word list. 

−m #
Prevents the extraction of words occurring less than # times. 

−P #
Allows extraction of only the most significant # per cent of the words encountered. 

−n #
Sets the number of words extracted to #. 

−t # Prevents the extraction of words with an index of peculiarity less than #. 

−l wftable
Uses wftable instead of the default table, /usr/lib/indexing/DefaultEnglish.wf. 

−s # Slides a window of # bytes in length over the file, computing the peculiarities of the words within the window from the window-relative frequencies. 

−i # Sets the window increment to #.  The default behavior is to slide the window in increments equal to half the window size.  This option takes effect only if -s is also specified. 

EXAMPLE

The command

cat Books/AliceInWonderland/∗ | pword -s 30000 -n 20

slides a 30,000-byte window over the book Alice In Wonderland and prints the 20 most interesting words from each window.  This makes possible a trace of the prominence of a principal character, or of a topic of interest throughout a long text. 

SEE ALSO

wc(1), spell(1), hist(1), diction(1), ixBuild(1)

BUGS

Documents that contain formatting information (e.g., troff) must be passed through a filter of some kind (e.g., deroff) to produce usable results. 

Pword considers a word to be any string of letters, possibly including a single apostrophe or hyphen.  All words are folded to lower case and singular form as required.  It ignores particles, doesn’t recognize synonyms or multiword "key phrases", and doesn’t do any kind of stemming or suffix analysis.  It currently only understands English. 

NeXT, Inc.  —  July 7, 1989

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026