sort(1) sort(1)
NAME
sort - sort and/or merge files
SYNOPSIS
sort [option ...] [file ...]
DESCRIPTION
sort sorts lines in an input file and writes the result on the stan-
dard output.
If you specify more than one file, sort sorts and merges the files in
the same operation, i.e. the contents of all input files are sorted
and printed together.
Sorting can be performed either by whole lines or by specific parts of
lines, known as sort keys. If you wish to sort by whole lines, you do
not specify any sort keys; one or more keys can be used to sort by
particular portions of lines. A sort key is defined by specifying the
positions of fields in a line in the form +pos1 -pos2 (see Defining
specific sort keys).
sort divides the lines of a file into fields. A field is a string of
characters that is delimited by a field separator or a newline. Blanks
and tabs are the default field separators. In a sequence of one or
more default separators, all separators are part of the next field.
Leading blanks at the beginning of a line thus by default form part of
the first field.
OPTIONS
No option specified
sort sorts the input lines lexicographically, whereby a byte is
used for each single character. The sorting order defined using
LCCOLLATE is valid for the characters.
Options that alter the behavior of sort
-c sort checks whether the input file is already sorted according to
the current ordering rules. If it is, nothing is output; other-
wise, the first line that does not match the ordering rules is
displayed.
Only one file may be specified with option -c! The options -m and
-o must not be combined with -c.
Together with -u: sort also checks whether there are lines with
identical sort keys available.
-m sort merges input files which are already sorted.
-m must not be combined with -c.
Page 1 Reliant UNIX 5.44 Printed 11/98
sort(1) sort(1)
-o outputfile
outputfile is the name of a file to which the sorted contents of
the input file are to be written. The file named as outputfile
can also be one of the input files, but in this case the original
unsorted contents of the named file are overwritten.
Only one -o option must be specified. -o must not be combined
with -c.
-o outputfile not specified: sort writes on the standard output.
-T directory
Specifies a directory for temporary files.
-T not specified: Temporary files are created in /var/tmp.
-u (unique) Causes identical lines to be output once only. Lines
with identical sort keys are considered identical lines.
-y [kmem]
Option -y defines the memory size that sort uses to start with.
This initial size has a large impact on the speed with which the
file is sorted. It is a waste of memory or of CPU time to sort a
small file in a large amount of memory or a large file in a small
amount of memory respectively.
kmem Amount of memory (in Kbytes) initially assigned to sort.
If you assign a value above the maximum of 1 Mbyte or
below the minimum of 16 Kbyte, the corresponding extremum
will be used. Thus if you define a value of 0 (-y0), for
example, sort will start with minimum memory.
kmem not specified: sort starts with maximum memory.
-y [kmem] not specified:
sort starts with a system default memory size (32 Kbytes), and
continues to use more space if required.
-z recsz
With this option you allocate correctly sized buffers for the
merge phase. You only need to do this if you are using option -c
or -m, i.e. if you are not actually sorting the files.
Page 2 Reliant UNIX 5.44 Printed 11/98
sort(1) sort(1)
If you are sorting the files, sort records the size of the long-
est line read in the sort phase so that buffers of the correct
size can be allocated during the merge phase.
If you are not sorting the files, sort normally uses a default
value for the buffer size. Lines longer than this will cause sort
to terminate abnormally. Supplying the actual number of bytes in
the longest line to be merged (or some larger value) will prevent
abnormal termination.
Options that alter ordering rules
The following options can be specified in either of two ways:
- either as options before the first positioning specification:
They are then valid globally for all subsequent sort keys. When
using -k, the options must be placed before the first specification
of -k; in the case of +pos or -pos, the options can also be placed
between the additional positioning specifications, and are then
only valid for the subsequent sort keys.
- or as modifiers for individual sort keys:
You then cancel the global settings for the relevant sorting field,
i.e. a change to the ordering rule is only valid when made in
accordance with the specified modification.
Option letters appended to the field specification without a dash
or a blank act as modifiers (see Defining specific sort keys).
-b Ignores leading field separators when determining the start and
end of a sort key. Note that the b option is only effective when
sorting is based on sort keys (i.e. not on the whole line).
-d Performs a lexicographical sort, taking into account only the
characters for which the C functions isalnum(3C) and isspace(3C)
return a value of "true". These are the characters defined in the
current locale as alphanumeric letters, digits, or characters
producing white space, such as blanks or tabs.
-f Folds lowercase into uppercase before sorting, thus making no
distinction between them.
-i In non-numeric comparisons, ignores all characters for which the
C function isprint(3C) returns a value of "false", i.e. all char-
acters defined as non-printing in the current locale. If the col-
lating sequence is based on the ASCII table, for example, charac-
ters 001 through 037 (octal) and character 0177 (octal) are
ignored [see ascii(5)].
Page 3 Reliant UNIX 5.44 Printed 11/98
sort(1) sort(1)
-M The first three characters of the sort key are converted to
uppercase, treated as names of months, and collated in calendar
order. The -M option implies the -b option.
-n Sorts numerically. A numeric value must come first in the sort
key and may consist of: blanks, minus signs, digits 0-9, and a
decimal point. The -n option implies the -b option, i.e. leading
blanks are ignored.
-r Reverses the collating sequence (sorting order).
Option that alters field separators
-t x Uses the character you specify for x as the field separator.
Unlike default field separators, x is itself not part of a field.
It may, however, be part of a sort key, for example if the sort
key extends from the first to the third x-separated field. Every
field separator x is significant, i.e. xx delimits an empty
field.
-t not specified:
The default field separators apply (blanks and tabs). A sequence
of one or more default field separators forms part of the follow-
ing field.
Defining specific sort keys
When defining sort keys please note that sequences of letters defined
as one collating element in the current locale count as a single
letter. In a Spanish locale, for example, ch is a single collating
element.
Specifying sort keys with the new synopsis -k fieldseparator has the
same effect as using the old synopsis +pos1 or -pos2, but the two must
not be combined. Conversion to the new synopsis is recommended.
You can specify several sort keys. Where there are several sort keys,
sort first sorts by the first sort key, moves on to the next if the
first sort key is equal, and so on.
-k fieldseparator
With -k you define start and end of a sort key. In
fieldseparator you define the first and last character of the
sort key.
fieldseparator has the following format:
startfield[type][,endfield[type]]
whereby the number of the field and a character in the field can
be specified for startfield and endfield:
Page 4 Reliant UNIX 5.44 Printed 11/98
sort(1) sort(1)
m[.n]
m and n are integers with the following significance:
m m specifies the number of the first or last field.
.n n specifies the number of the first character used in the
first field or the number of the last character used in the
last field.
n not specified:
The field is used by the first character through to the last
character.
type modifies the sort key (see Options that alter ordering
rules).
+pos1 [-pos2]
+pos1 and -pos2 specify the start and end of a sort key on the
basis of the fields in the input lines.
+pos1 is the position of the first character in the sort key;
-pos2 refers to the first character after it. +pos1 must come
before -pos2.
-pos2 not specified:
The sort key extends from +pos1 to the end of the line.
The pos1 and pos2 arguments have the form:
m[.n][type]
where m and n are integers with the following significance:
m Skips m fields of the line, addressing field m+1.
.n Skips n characters plus the field separator as of the last
character of field m, thus addressing character n+1 within
field m+1. If the -b option is in effect, field separators
at the start of a field are not counted; thus, +m.nb refers
to the n+1th non-whitespace character after field m.
.n not specified:
Is equivalent to .0 and refers to the first character after
field m. If the -b option is in effect, field separators at
the start of a field are not counted; thus, +m.0b refers to
the first non-whitespace character in the m+1th field.
Page 5 Reliant UNIX 5.44 Printed 11/98
sort(1) sort(1)
type Modifies the sort key (see Options that alter ordering
rules).
Example:
To specify a sort key that begins with the fourth character in
the second field and ends with this field, you enter:
sort -k 2.4,2 (new synopsis) or
sort +1.3 -2 (old synopsis)
Explanation:
End End End
Field1 Field2 Field3
| | |
030-456537 A.Mackenzie Dublin
| |
Sort key:
2.4 Start at the 4th character of the 2nd field
+1.3 Skip field 1 and 3 characters:
the 4th character after field 1 is the 1st character in the
sort key: M
-2 Skip field 2 and 0 characters:
the 1st character after field 2 is the 1st character after
the sort field: blank. Thus the character before is the last
character in the sort key: n
Note that default field separators, unlike those defined
with option -t, are part of the following field. Hence the
first character of field 2 is the blank, the second charac-
ter is the A, and so on.
-- End of the list of options. Must be specified if file begins with
-.
file Name of the file you wish to sort.
You may name more than one file. All named files are sorted and
merged, and the input lines from all of them together are sorted
and written to standard output. In the input files, any letter
sequence defined as a collating element in the current locale
counts as a single letter. Thus in a Spanish locale ch is a sin-
gle collating element. When the last line in an input file is
missing a newline character, sort appends one, issues a warning,
and continues.
Page 6 Reliant UNIX 5.44 Printed 11/98
sort(1) sort(1)
Only one file may be specified together with the -c option.
If you use a dash (-) as the name for file, sort reads from stan-
dard input.
file not specified: sort reads from standard input.
EXIT STATUS
0 All input files were processed correctly. The input file was
sorted correctly when -c was specified.
1 -c specified: The input file was not sorted correctly. -c -u
specified: Lines of input with identical sort keys were found.
>1 Error
LOCALE
The LCMESSAGES environment variable governs the language in which
message texts are displayed.
LCCOLLATE governs the preset collating sequence used by the sort com-
mand.
LCCTYPE governs how character classes are handled by the -b, -d, -f
and -i options.
LCNUMERIC governs the form of the radix character (decimal point) in
conjunction with the -n option.
LCTIME governs the currently valid month names, their abbreviations
and their collating sequence in conjunction with option -M.
Answers to yes/no queries must be given in the language appropriate to
the current locale.
If LCMESSAGES, LCCOLLATE, LCCTYPE, LCNUMERIC or LCTIME is unde-
fined or is defined as the null string, it defaults to the value of
LANG. If LANG is likewise undefined or null, the system acts as if it
were not internationalized.
The LCALL environment variable governs the entire locale. LCALL
takes precedence over all the other environment variables which affect
internationalization.
If any of the locale variables has an invalid value, the system acts
as if none of the variables were set.
Page 7 Reliant UNIX 5.44 Printed 11/98
sort(1) sort(1)
EXAMPLES
Example 1
Sorting the contents of inputfile with the second field as the sort
key.
$ sort -k 2,2 inputfile
Example 2
Sorting the contents of inputfile1 and inputfile2 in reverse order,
placing the output in outputfile, and using the first character in
the second field as the sort key.
$ sort -r -o outputfile -k 2.1,2.2 inputfile1 inputfile2
Example 3
Sort the contents of inputfile1 and inputfile2 in reverse order,
using the first non-blank character of the second field as the sort
key.
$ sort -r -o outputfile -k 2.0b,2.1b inputfile1 inputfile2
Example 4
Displaying the /etc/passwd file, sorted by the numeric user ID (field
3).
$ sort -t : -k 3n,3 /etc/passwd
Example 5
Displaying the presorted file inputfile, suppressing all but the
first occurrence of lines having the same third field.
$ sort -u -k 3,3 inputfile
FILES
/var/tmp/stm???
Temporary files
SEE ALSO
comm(1), join(1), uniq(1), ctype(3C).
Page 8 Reliant UNIX 5.44 Printed 11/98