Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ sort(1) — HP-UX 8.05

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

comm(1)

join(1)

uniq(1)

toupper(3C)

collate8(4)

environ(5)

hpnls(5)

lang(5)

sort(1)

NAME

sort − sort and/or merge files

SYNOPSIS

sort [−cmu] [−ooutput] [−ykmem] [−zrecsz] [−Tdir] [−tx] [−bdfilnrM] [−kkeydef] [file ...]

sort [−cmu] [−ooutput] [−ykmem] [−zrecsz] [−Tdir] [−tx] [−bdfilnrM] [+pos1 [−pos2]] [file ...]

DESCRIPTION

sort sorts lines of all the named files together and writes the result on the standard output.  The standard input is read if − is used as a file name or no input files are specified. 

Comparisons are based on one or more sort keys extracted from each line of input.  By default, there is one sort key, the entire input line, and ordering is lexicographic by bytes in machine collating sequence. 

Behavior Modification Options

The following options alter the default behavior:

−c Check that the input file is sorted according to the ordering rules; give no output unless the file is out of sort. 

−m Merge only, the input files are already sorted. 

−u Unique: suppress all but one in each set of lines having equal keys. 

−ooutput The argument given is the name of an output file to use instead of the standard output.  This file can be the same as one of the inputs.  There can be optional blanks between −o and output. 

−ykmem The amount of main memory used by the sort has a large impact on its performance.  Sorting a small file in a large amount of memory is a waste.  If this option is omitted, sort begins using a system default memory size, and continues to use more space as needed.  If this option is presented with a value, kmem, sort will start using that number of kilobytes of memory, unless the administrative minimum or maximum is violated, in which case the corresponding extremum will be used.  Thus, −y0 is guaranteed to start with minimum memory.  By convention, −y (with no argument) starts with maximum memory. 

−zrecsz The size of the longest line read is recorded in the sort phase so buffers can be allocated during the merge phase.  If the sort phase is omitted via the −c or −m options, a popular system default size will be used.  Lines longer than the buffer size will cause sort to terminate abnormally.  Supplying the actual number of bytes in the longest line to be merged (or some larger value) will prevent abnormal termination. 

−Tdir Use dir as the directory for temporary sort records rather than the default directory, which is /usr/tmp. 

Ordering Rule Options

The following options override the default ordering rules:

−d Quasi-dictionary order: only letters (isletter()), digits (isdigit()) and blanks (spaces and tabs) are significant in comparisons (see ctype(3C)). The −d option is ignored for languages with multi-byte characters; all characters are significant. 

−f Fold letters.  Prior to being compared, all letters are effectively folded as if by toupper(). The −f option is ignored for languages with multi-byte characters; all characters are collated unfolded. 

−i In non-numeric comparisons, ignore all characters for which isprint(3C) returns false (see ctype(3c)). For the ASCII character set, octal character codes 001 through 037 and 0177 are ignored.  For languages with multi-byte characters, the −i option is ignored which means that all characters are significant. 

−M Compare as months.  The first several non-blank characters of the field are folded to uppercase and compared with the langinfo(3C) items ABMON_1 < ABMON_2 < ... < ABMON_12.  An invalid field is treated as being less than ABMON_1 string.  For example, American month names are compared such that JAN < FEB < ... < DEC .  An invalid field is treated as being less than all months.  The −M option implies the −b options (see below). 

−n An initial numeric string, consisting of optional blanks, optional minus sign, and zero or more digits with optional radix character, is sorted by arithmetic value.  The langinfo(3C) item RADIXCHAR is used as the radix character.  The −n option implies the −b option (see below). 

−r Reverse the sense of comparisons. 

−kkeydef The keydef argument is a restricted sort key definition.  The format of this definition is

keydef, field_start [type] [, field_end][type]

which defines a key field beginning at field_start and ending at field_end.  The characters at positions field_start and field_end are included in the key field, providing that field_end does not preceede field_start.  A missing field_end means the end of the line. 

Specifying field_start and field_end involves the notion of a field, a minimal sequence of characters followed by a field seperator or a <newline>.  By default, the first <blank> of a sequence of <blank>s act as the field seperator.  All <blank>s in a sequence of <blank>s are considered to be part of the next field; for example, all <blank>s at the beginning of a line are considered to be part of the first field. 

The arguments field_start and field_end each have the form m.n followed by one or more of the options b, d, f, i, n, r.  These modifiers have the functionality for this key only, that their command line counter-parts have for the entire record.  A field_start position specified by m.n is interpreted to mean the n th charcter in the m+1 th field.  A missing n means .0, indicating the first character of the m+1 th field.  If the −b option is in effect, n is counted from the first non-<blank> character in the m+1 th field. 

A field_end position specified by m.n is interpreted to mean the n th charcter (including separators) after the last character of the m th field.  A missing n means .0, indicating the first character of the m th field.  If the −b option is in effect, n is counted from the last leading <blank>s of the m+1 th field; m.1b refers to the first non-<blank> int the m+1 th field;

Multiple -k options are permitted and are significant in command line order.  If no -k option is specified, a default sort key of the entire line is used. 

This option is intended to replace the old [+pos1 [+pos2]] notation, using field_start and field_end respectively. 

−l This option is ignored.  Previously it was used to activate sorting using the collation rules associated with the user’s LANG variable (see environ(5)). Language-sensitive collation is now the standard behavior.

If the language is not specified or is set to the "C" locale , the ordering is lexicographic by bytes in machine-collating sequence.  If the user’s language includes multi-byte characters, single-byte characters are machine-collated before multi-byte characters. 

Restricted Sort Keys

The notation +pos1 −pos2 restricts a sort key to one beginning at pos1 and ending at pos2. The characters at positions pos1 and pos2 are included in the sort key (provided that pos2 does not precede pos1). A missing −pos2 means the end of the line. 

When ordering options appear before restricted sort key specifications, the requested ordering rules are applied globally to all sort keys.  When attached to a specific sort key (described below), the specified ordering options override all global ordering options for that key. 

Specifying pos1 and pos2 involves the notion of a field, a minimal sequence of characters followed by a field separator or a new-line.  By default, the first blank (space or tab) of a sequence of blanks acts as the field separator.  All blanks in a sequence of blanks are considered to be part of the next field; for example, all blanks at the beginning of a line are considered to be part of the first field.  The treatment of field separators can be altered using the options:

−tx Use x as the field separator character; x is not considered to be part of a field (although it can be included in a sort key).  Each occurrence of x is significant (for example, xx delimits an empty field). 

−b Ignore leading blanks when determining the starting and ending positions of a restricted sort key.  If the −b option is specified before the first +pos1 argument, it will be applied to all +pos1 arguments.  Otherwise, the b flag can be attached independently to each +pos1 or −pos2 argument (see below).  Note that the −b option is only effective when restricted sort key specifications are in effect. 

pos1 and pos2 each have the form m.n optionally followed by one or more of the flags bdfinrM.  A starting position specified by +m.n is interpreted to mean character n+1 in field m+1. A missing .n means .0, indicating the first character of field m+1. If the b flag is in effect, n is counted from the first non-blank in field m+1; +m.0b refers to the first non-blank character in field m+1.

A last position specified by −m.n is interpreted to mean the nth character (including separators) after the last character of the m th field.  A missing .n means .0, indicating the last character of the mth field. If the b flag is in effect, n is counted from the last leading blank in field m+1; −m.1b refers to the first non-blank in field m+1.

When there are multiple sort keys, later keys are compared only after all earlier keys compare equal.  Lines that otherwise compare equal are ordered with all bytes significant.  If all the specified keys compare equal, the entire record is used as the final key. 

EXTERNAL INFLUENCES

Environment Variables

LC_COLLATE determines the default ordering rules applied to the sort. 

LC_CTYPE determines the behavior of character classification for the −d, −f, and −i options. 

LC_NUMERIC determines the definition of the radix character for the −n option. 

LC_TIME determines the month names for the −M option. 

LANG determines the language in which messages are displayed. 

If either LC_COLLATE, LC_CTYPE, LC_NUMERIC, or LC_TIME is not specified in the environment or is set to the empty string, the value of LANG is used as a default for each unspecified or empty variable.  If LANG is not specified or is set to the empty string, a default of "C" (see lang(5)) is used. If any of the internationalization variable contains an invalid setting, sort behaves as if all internationalization variables were set to "C".  See environ(5).

International Code Set Support

Single- and multi-byte character code sets are supported. 

EXAMPLES

Sort the contents of infile with the second field as the sort key:

sort +1 −2 infile

Sort, in reverse order, the contents of infile1 and infile2, placing the output in outfile and using the first character of the second field as the sort key:

sort −r −o outfile +1.0 −1.2 infile1 infile2

Sort, in reverse order, the contents of infile1 and infile2, using the first non-blank character of the second field as the sort key:

sort −r +1.0b −1.1b infile1 infile2

Print the password file (passwd(4)) sorted by the numeric user ID (the third colon-separated field):

sort −t: +2n −3 /etc/passwd

Print the lines of the already sorted file infile, suppressing all but the first occurrence of lines having the same third field (the options −um with just one input file make the choice of a unique representative from a set of equal lines predictable):

sort −um +2 −3 infile

DIAGNOSTICS

sort comments and exits with non-zero status for various trouble conditions such as when input lines are too long, and for disorder discovered under the −c option. 

When the last line of an input file is missing a new-line character, sort appends one, prints a warning message, and continues. 

If an error occurs when accessing the tables that contain the collation rules for the specified language, sort prints a warning message and defaults to the C locale. 

If a −d, −f or −i option is specified for a language with multi-byte characters, sort prints a warning message and ignores the option. 

WARNINGS

A field separator specified by the −t option is recognized only if it is a single-byte character. 

The ctype(3c) functions isletter(), isdigit(), sspace(), and isprint() are not defined for multi-byte characters. For languages with multi-byte characters, all characters are significant in comparisons.

FILES

/usr/tmp/stm??? 

SEE ALSO

comm(1), join(1), uniq(1), toupper(3C), collate8(4), environ(5), hpnls(5), lang(5). 

STANDARDS CONFORMANCE

sort: SVID2, XPG2, XPG3, proposed POSIX.2 FIPS (June 1990)

Hewlett-Packard Company  —  HP-UX Release 8.05: June 1991

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026