uniq(1) uniq(1)
NAME
uniq - report repeated lines
SYNOPSIS
uniq [option ...] [inputfile [outputfile]]
DESCRIPTION
The command uniq searches a file for sequences of identical lines, and
writes the file to standard output, removing all but one of repeated
lines in the process. Note that repeated lines must be adjacent in
order to be found, i.e. the input file must be sorted.
OPTIONS
The options -c, -d, and -u must not be combined.
No option specified:
The named inputfile is output without repeated lines.
-c Outputs all lines without repetitions, starting each line with a
decimal number to indicate how often it occurred repeatedly in
inputfile. Counts are printed right-justified up to column 4;
actual lines begin on column 6. uniq ignores the -u and -d
options if set with the -c option.
-d Outputs one copy each of only those lines that are repeated in
inputfile.
-s n Causes the first n characters from the beginning of the line to
be ignored when comparing for duplicates.
If the -s option is combined with the -f option, the first n
characters after the mth field are ignored. Blanks following the
mth field are not ignored: they must be allowed for in the value
of n.
This corresponds to the old option +n, which is still supported,
but must not be combined with the new synopsis (-f or -s).
-s not specified:
Lines are compared from the beginning of the line or beginning
with field m+1 (option -f).
-f m Ignores the first m fields from the beginning of the line, plus
any tabs or blanks located in front of a field, when comparing
for duplicates. A field is a string of non-blank characters sepa-
rated from its neighbors by tabs or blanks.
This corresponds to the old option -m, which is still supported,
but must not be combined with the new synopsis (-f or -s).
Page 1 Reliant UNIX 5.44 Printed 11/98
uniq(1) uniq(1)
-f not specified:
Lines are compared from the beginning of the line or beginning
with character n+1 (option -s).
-u Outputs only the lines that are not repeated in inputfile.
-- If inputfile begins with a dash (-), the end of the command-line
options must be marked with --.
inputfile
Name of the file that is to be examined. If you specify a dash -
for inputfile, uniq reads from the standard input.
inputfile not specified: uniq reads from standard input.
outputfile
Name of the file to which the output is to be written. If you
specify a dash - for outputfile, uniq writes to the standard
output.
outputfile not specified: uniq writes to standard output.
LOCALE
The LCMESSAGES environment variable governs the language in which
message texts are displayed.
LCCTYPE governs character classes and character conversion (shift-
ing).
If LCMESSAGES or LCCTYPE is undefined or is defined as the null
string, it defaults to the value of LANG. If LANG is likewise unde-
fined or null, the system acts as if it were not internationalized.
The LCALL environment variable governs the entire locale. LCALL
takes precedence over all the other environment variables which affect
internationalization.
If any of the locale variables has an invalid value, the system acts
as if none of the variables were set.
EXAMPLES
Example 1
You want to search a file for identical lines, regardless of where
they are located in the file. A count showing how often each of these
lines occurs is also to be output.
$ sort file | uniq -c
Page 2 Reliant UNIX 5.44 Printed 11/98
uniq(1) uniq(1)
Example 2
You want to output the 10 most frequently occurring words in the file
text.
$ cat text | sed 's/[ ][ ]*/\
> /g' | sed '/^$/d' | sort | uniq -c | sort -rn | head
Explanation:
- The first sed call generates a list of all words from text, by
replacing consecutive tabs or blanks by a newline character. The
square brackets each contain a tab and a blank.
- The second sed call deletes all blank lines.
- sort sorts the generated list in ASCII collating sequence.
- uniq -c removes duplicate lines from the sorted list and precedes
each remaining line with a count indicating its frequency of
occurrence.
- sort -rn does a reverse sort on this frequency list, i.e. the most
frequent line appears first; the line with the least number of
repetitions appears last.
- head prints the first 10 lines of the list.
SEE ALSO
comm(1), sort(1).
Page 3 Reliant UNIX 5.44 Printed 11/98