Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ sort(1) — SunOS 5.1

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

comm(1)

join(1)

uniq(1)

environ(5)

sort(1)

NAME

sort − sort and/or merge files

SYNOPSIS

sort [ −cmu ] [ −ooutput ] [ −T directory ] [ −ykmem ] [ −zrecsz ] [ −dfiMnr ] [ −btx ]

[ +pos1 [ −pos2 ]] [ filename...]

AVAILABILITY

SUNWdoc

DESCRIPTION

The sort command sorts lines of all the named files together and writes the result on the standard output.  The standard input is read if ’−’ is used as a file name or no input-files are named. 

Comparisons are based on one or more sort keys extracted from each line of input.  By default, there is one sort key, the entire input line, and ordering is lexicographic by bytes in machine collating sequence. 

OPTIONS

The following options alter the default behavior:

−c Check that the input-file is sorted according to the ordering rules; give no output unless the file is out of sort. 

−m Merge only, the input-files are already sorted. 

−u Unique: suppress all but one in each set of lines having equal keys. 

−ooutput The argument given is the name of an output-file to use instead of the standard output.  This file may be the same as one of the inputs.  There may be optional blanks between −o and output. 

−T directory The directory argument is the name of a directory in which to place temporary files. 

−ykmem The amount of main memory used by sort has a large impact on its performance.  Sorting a small file in a large amount of memory is a waste.  If this option is omitted, sort begins using a system default memory size, and continues to use more space as needed.  If this option is presented with a value kmem, sort will start using that number of kilobytes of memory, unless the administrative minimum or maximum is violated, in which case the corresponding extremum will be used.  Thus, −y0 is guaranteed to start with minimum memory.  By convention, −y (with no argument) starts with maximum memory. 

−zrecsz The size of the longest line read is recorded in the sort phase so buffers can be allocated during the merge phase.  If the sort phase is omitted via the −c or −m options, a popular system default size will be used.  Lines longer than the buffer size will cause sort to terminate abnormally.  Supplying the actual number of bytes in the longest line to be merged (or some larger value) will prevent abnormal termination. 

The following options override the default ordering rules. 

−d “Dictionary” order: only letters, digits, and blanks (spaces and tabs) are significant in comparisons. 

−f Fold lower-case letters into upper case. 

−i Ignore non-printable characters. 

−M Compare as months.  The first three non-blank characters of the field are folded to upper case and compared.  For example, in English the sorting order is "JAN" < "FEB" < ... < "DEC".  Invalid fields compare low to "JAN".  The −M option implies the −b option (see below). 

−n An initial numeric string, consisting of optional blanks, optional minus sign, and zero or more digits with optional decimal point, is sorted by arithmetic value.  The −n option implies the −b option (see below).  Note:  The −b option is only effective when restricted sort key specifications are in effect. 

−r Reverse the sense of comparisons. 

When ordering options appear before restricted sort key specifications, the requested ordering rules are applied globally to all sort keys.  When attached to a specific sort key (described below), the specified ordering options override all global ordering options for that key. 

The notation +pos1 −pos2 restricts a sort key to one beginning at pos1 and ending just before pos2. The characters at position pos1 and just before pos2 are included in the sort key (provided that pos2 does not precede pos1). A missing −pos2 means the end of the line. 

Specifying pos1 and pos2 involves the notion of a field, a minimal sequence of characters followed by a field separator or a new-line.  By default, the first blank (space or tab) of a sequence of blanks acts as the field separator.  All blanks in a sequence of blanks are considered to be part of the next field; for example, all blanks at the beginning of a line are considered to be part of the first field.  The treatment of field separators can be altered using the options:

−b Ignore leading blanks when determining the starting and ending positions of a restricted sort key.  If the −b option is specified before the first +pos1 argument, it will be applied to all +pos1 arguments.  Otherwise, the b flag may be attached independently to each +pos1 or −pos2 argument (see below). 

−tx Use x as the field separator character; x is not considered to be part of a field (although it may be included in a sort key).  Each occurrence of x is significant (for example, xx delimits an empty field). 

pos1 and pos2 each have the form m.n optionally followed by one or more of the flags bdfinr.  A starting position specified by +m.n is interpreted to mean the n+1st character in the m+1st field. A missing .n means .0, indicating the first character of the m+1st field. If the b flag is in effect n is counted from the first non-blank in the m+1st field; +m.0b refers to the first non-blank character in the m+1st field.

A last position specified by −m.n is interpreted to mean the nth character (including separators) after the last character of the m th field.  A missing .n means .0, indicating the last character of the mth field. If the b flag is in effect n is counted from the last leading blank in the m+1st field; −m.1b refers to the first non-blank in the m+1st field.

When there are multiple sort keys, later keys are compared only after all earlier keys compare equal.  Lines that otherwise compare equal are ordered with all bytes significant. 

EXAMPLES

Sort the contents of input-file with the second field as the sort key:

example% sort +1 −2 input-file

Sort, in reverse order, the contents of input-file1 and input-file2, placing the output in output-file and using the first character of the second field as the sort key:

example% sort −r −o output-file +1.0 −1.2 input-file1 input-file2

Sort, in reverse order, the contents of input-file1 and input-file2 using the first non-blank character of the second field as the sort key:

example% sort −r +1.0b −1.1b input-file1 input-file2

Print the password file, passwd(4), sorted by the numeric user ID (the third colon-separated field):

example% sort −t: +2n −3 /etc/passwd

Print the lines of the already sorted file input-file, suppressing all but the first occurrence of lines having the same third field (the options −um with just one input-file make the choice of a unique representative from a set of equal lines predictable):

example% sort −um +2 −3 input-file

ENVIRONMENT

If any of the LC_∗ variables ( LC_CTYPE, LC_MESSAGES, LC_TIME, LC_COLLATE, LC_NUMERIC, and LC_MONETARY ) (see environ(5)) are not set in the environment, the operational behavior of sort for each corresponding locale category is determined by the value of the LANG environment variable.  If LC_ALL is set, its contents are used to override both the LANG and the other LC_∗ variables.  If none of the above variables is set in the environment, the "C"  (U.S. style) locale determines how sort behaves. 

LC_CTYPE
Determines how sort handles characters. When LC_CTYPE is set to a valid value, sort can display and handle text and filenames containing valid characters for that locale.  sort can display and handle Extended Unix Code (EUC) characters where any individual character can be 1, 2, or 3 bytes wide.  sort can also handle EUC characters of 1, 2, or more column widths. In the "C" locale, only characters from ISO 8859-1 are valid. 

LC_MESSAGES
Determines how diagnostic and informative messages are presented. This includes the language and style of the messages, and the correct form of affirmative and negative responses.  In the "C" locale, the messages are presented in the default form found in the program itself (in most cases, U.S. English).

LC_TIME
Determines how sort handles date and time formats.  In the "C" locale, date and time handling follows the U.S.  rules. 

FILES

/var/tmp/stm??? 

SEE ALSO

comm(1), join(1), uniq(1), environ(5)

DIAGNOSTICS

Comments and exits with non-zero status for various trouble conditions (for example, when input lines are too long), and for disorders discovered under the −c option. 

NOTES

When the last line of an input-file is missing a new-line character, sort appends one, prints a warning message, and continues. 

sort does not guarantee preservation of relative line ordering on equal keys. 

SunOS 5.1  —  Last change: 14 Sep 1992

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026