sort
PURPOSE
Sorts or merges files.
SYNOPSIS
sort [ -Acmubdfinrtx ] [ +pos1[ -pos2]]... [ -ooutput ] [ names ]
DESCRIPTION
The sort command sorts lines in its input files and
writes the result to standard output. It treats all of
its input files as one file when it performs the sort. A
- (minus) in place of a file name specifies standard
input. If you do not specify any file names, it sorts
standard input.
The default sort key (the part of the line used for
sorting) is an entire line. Default ordering is
lexicographic by characters in the collating sequence.
The file /usr/pub/ascii shows the default collating
sequence. To change the default collating sequence, see
"ctab."
The two numbers, fskip and cskip, specify the sort key.
Both numbers have two parts, as follows:
+fskip.cskip
-fskip.cskip
The fskip specifies the number of fields to skip from the
beginning of the input line, and cskip specifies the
number of additional characters to skip to the right
beyond that point. For both the starting point
(+fskip.cskip) and the ending point (-fskip.cskip) of a
sort key, fskip is measured from the beginning of the
input line, and cskip is measured from the last field
skipped. If you omit .cskip, .0 is assumed. If you omit
fskip, 0 is assumed. If you omit the ending field
specifier (-fskip.cskip), the end of the line is the end
of the sort key.
You can supply more than one sort key by repeating
+fskip.cskip and -fskip.cskip. In cases where you
specify more than one sort key, keys specified further to
the right on the command line are compared only after all
earlier keys are sorted. For example, if the first key
is to be sorted in numerical order and the second in dic-
tionary order, all strings that start with the number one
are sorted alphabetically before the strings that start
with the number two. Lines that are identical in all
keys are sorted with all characters significant. You can
also specify different flags for different sort keys in
multiple sort keys. See the examples for illustration.
A field is one or more characters bounded by the begin-
ning of a line and the current field separator, or one or
more characters bounded by a the field separator on
either side. The space character is the default field
separator.
Notes:
1. Lines longer than 1024 are truncated.
2. The maximum number of fields on a line is 10.
FLAGS
-A Sorts on a byte-by-byte basis. This sort is
functionally compatible with the Version 1.1
sort command, prior to the addition of
international character support.
-b Ignores leading blanks, spaces, and tabs in
sort key comparisons.
-c Checks that the input is sorted according to
the ordering rules specified in the flags.
Displays nothing unless the file is not
sorted.
-d Sorts in dictionary order. Only letters,
digits and blanks are considered in compar-
isons.
-f Merges uppercase and lowercase letters.
Case is not considered in the sorting, so
that initial-capital words and all-capital
words are not grouped together at the begin-
ning of the output.
-i Sorts only by characters in the ASCII range
octal 040-0176 (all printable characters and
the space character) in nonnumeric compar-
isons.
-m Merges only; the input is already sorted.
-n Sorts any initial numeric strings (con-
sisting of optional blanks, optional minus
signs, and zero or more digits with optional
decimal point) by arithmetic value. The -n
flag automatically gives you the -b flag.
-o outfile Directs output to outfile instead of
standard output. outfile can be the same as
one of the input files.
-r Reverses the order of the specified sort.
-tchar Sets field separator character to char. To
specify the tab character as the field sepa-
rator, you must enclose it in single quota-
tion marks ("' '").
-T Uses current directory instead of default
directory for temporary files.
-u Suppresses all but one in each set of equal
lines. Ignored characters (such as leading
tabs and spaces) and characters outside of
sort keys are not considered in this type of
comparison.
EXAMPLES
1. To perform a simple sort:
sort fruits
This displays the contents of "fruits" sorted in
ascending lexicographic order. This means that the
characters in each column are compared one by one,
including spaces, digits, and special characters.
For instance, if "fruits" contains the text:
banana
orange
Persimmon
apple
%%banana
apple
ORANGE
then sort displays:
%%banana
ORANGE
Persimmon
apple
banana
orange
This order follows from the fact that in the ASCII
collating sequence, "%" (percent sign) precedes the
uppercase letters, which precede the lowercase
letters. If the system uses a character set other
than ASCII, your results may be different.
2. To sort in dictionary order:
sort -d fruits
This sorts and displays the contents of "fruits",
comparing only letters, digits, and blanks. If
"fruits" is the same as in Example 1, then sort dis-
plays:
ORANGE
Persimmon
apple
%%banana
banana
orange
The "-d" flag tells sort to ignore the "%" character
because it is not a letter, digit, or blank. This
puts "%%banana" next to "banana".
3. To group lines that contain uppercase and special
characters with similar lowercase lines:
sort -d -f fruits
This ignores special characters ("-d") and differ-
ences in case ("-f"). Given the "fruits" of Example
1, this displays:
apple
%%banana
banana
ORANGE
orange
Persimmon
4. To sort as in Example 3 and remove duplicate lines:
sort -d -f -u fruits
The "-u" flag tells sort to remove duplicate lines,
making each line of the file unique. This displays:
apple
%%banana
orange
Persimmon
Note that not only was the duplicate "apple" removed,
but "banana" and "ORANGE" as well. These were
removed because the "-d" told sort to treat
"%%banana" as if it were "banana", and the "-f" told
it to treat "ORANGE" as "orange". Thus, sort consid-
ered "%%banana" to be a duplicate of "banana" and
"ORANGE" a duplicate of "orange".
Note: There is no way to predict which duplicate
lines "sort -u" will keep and which it will remove.
5. To sort as in Example 3 and remove duplicates, unless
capitalized or punctuated differently:
sort -u +0 -d -f +0 fruits
The "+0 -d -f" does the same type of sort done with
"-d -f" in Example 3. Then the "+0" performs another
comparison to distinguish lines that are not actually
identical. This prevents "-u" from removing them.
Given the "fruits" file shown in Example 1, the added
"+0" distinguishes "%%banana" from "banana" and
"ORANGE" from "orange". However, the two instances
of "apple" are identical, so one of them is deleted.
apple
%%banana
banana
ORANGE
orange
Persimmon
6. To specify the character that separates fields:
sort -t: +1 vegetables
This sorts "vegetables", comparing the text that
follows the first colon on each line. The "+1" tells
sort to ignore the first field and to compare from
the start of the second field to the end of the line.
The "-t:" tells sort that colons separate fields. If
"vegetables" contains:
yams:104
turnips:8
potatoes:15
carrots:104
green beans:32
radishes:5
lettuce:15
then sort displays:
carrots:104
yams:104
lettuce:15
potatoes:15
green beans:32
radishes:5
turnips:8
Note that the numbers are not in numeric order. This
happened because a lexicographic sort compares each
character from left to right. In other words, ""3""
comes before ""5"" and ""2"" comes before "" "", so
""32"" comes before ""5 "".
7. To sort numbers:
sort -t: +1 -n vegetables
This sorts "vegetables" numerically on the second
field. If "vegetables" is the same as in Example 6,
then sort displays:
radishes:5
turnips:8
lettuce:15
potatoes:15
green beans:32
carrots:104
yams:104
8. To sort on more than one field:
sort -t: +1 -2 -n +0 -1 -r vegetables
This performs a numeric sort on the second field
("+1 -2 -n"). Within that ordering, it sorts the
first field in reverse alphabetic order ("+0 -1 -r").
The output looks like this:
radishes:5
turnips:8
potatoes:15
lettuce:15
green beans:32
yams:104
carrots:104
Now the lines are sorted in numeric order. When two
lines have the same number, they appear in reverse
alphabetic order.
9. To replace the original file with the sorted text:
sort -o vegetables vegetables
This stores the sorted output into the file
"vegetables" ("-o vegetables").
FILES
sort.c Contains sort definitions.
RELATED INFORMATION
The following commands: "comm," "join," and "uniq."
The "Overview of International Character Support" in Man-
aging the AIX Operating System.