join(C) — OpenDesktop 3.0.0


 join(C)                       06 January 1993                        join(C)


 Name

    join - join two relations

 Syntax

    join [ options ] file1 file2

 Description

    The join command prints to the standard output a join of the two rela-
    tions specified by the lines of file1 and file2.  If file1 is a dash (-),
    the standard input is used.

    file1 and file2 must be sorted in increasing collating sequence (defined
    by the current locale; see locale(M)) on the fields on which they are to
    be joined, normally the first in each line.

    There is one line in the output for each pair of lines in file1 and file2
    that have identical join fields.  The output line normally consists of
    the common field, then the rest of the line from file1, then the rest of
    the line from file2.

    Fields are normally separated by blank, tab or newline.  In this case,
    multiple separators count as one, and leading separators are discarded.

    These options are recognized:

    -a n      In addition to the normal output, produces a line for each
              unpairable line in file n, where n is 1 or 2.

    -e s      Replaces empty output fields by string s.

    -j n m    Joins on the mth field of file n.  If n is missing, uses the
              mth field in each file.

    -o list   Each output line comprises the fields specified in list, each
              element of which has the form n.m where n is a file number and
              m is a field number.

    -t c      Uses character c as a field separator.  Every appearance of c
              in a line is significant.

 Examples

    An employer has two files of data about his employees.  The first file,
    E_pers, contains the name, address, and payroll number of each employee.
    The second file, E_dept, contains the payroll number, position, salary,
    and department of each employee. The fields in each employee's record are
    separated by colons ``:'', and each record occupies one line of a file.

    The format of a record in the file E_pers would be:

       Name:Address:Payroll#

    Here are some example records that might be in the file E_pers:

       Stross C:33 Wingfield Court, Anytown, Pa 65000:133
       Leckie N:17 Awberry Court, Ubique, Ca 18480:91
       Dixon J:14 Tulip Gardens, Appledorn, NY 10010:231

    Each record in the file E_dept would have the format:

       Payroll#:Position:Salary:Department

    The following are some example records that might appear in E_dept:

       91:Manager:85000:Marketing
       133:Clerk:22000:Accounts
       231:Clerk:19500:Accounts

    The employer wants to know which employees earn more than $20,000.

    Firstly, the fields in the file E_pers are put in the order:

       Payroll#:Name:Address

    This ensures that the field that the files are to be joined on (payroll
    number) is the first field in both files. The reordering is performed
    using awk(C), and its output is piped through sort(C).  The sample
    records from E_pers given above would become:

       133:Stross C:33 Wingfield Court, Anytown, Pa 65000
       231:Dixon J:14 Tulip Gardens, Appledorn, NY 10010
       91:Leckie N:17 Awberry Court, Ubique, Ca 18480

    Note that the records are sorted into order of payroll number according
    to the collating sequence (not increasing numeric order). The reordered
    and sorted file is named P_tmp:

       awk  -F:  '{print $3":"$1":"$2}'  Epers | sort  -t:  +0  -1  > Ptmp

    Next, the records in the file E_dept are sorted on payroll number and the
    output is placed in D_tmp:

       sort  -t:  +0  -1  Edept > Dtmp

    The sorted files are now joined on their first fields (payroll number),
    and the output is filtered using awk. A final pass through sort puts the
    output in alphabetical order:

       join -t: -o 1.2 2.3 Ptmp Dtmp | awk -F: '$2>20000 {print $1" $"$2}' | sort

    This command prints the employee's name (the second field of P_tmp) and
    their salary (the third field of D_tmp) if it is more than $20000.  For
    the sample records given above, the output would be:

       Leckie N $85000
       Stross C $22000

    The output from join can be directed to a file (E_joined):

       join  -t:  -o 1.1 1.2 1.3 2.2 2.3 2.4  Ptmp  Dtmp  > Ejoined

    Omitting the -o option causes all fields to be written to the output
    file, with the fields from file1 being followed by those from file2. How-
    ever, the field on which the files were joined is only included once.
    Thus the join command above could be entered as:

       join  -t:  Ptmp  Dtmp  > Ejoined

    The file E_joined may be processed using awk to extract any other infor-
    mation required. Here a list of all employees sorted by name within each
    department is output:

       cat Ejoined | awk -F: '{print $6" "$2}' | sort

    For the sample records used in the example above, the output would be:

       Accounts Dixon J
       Accounts Stross C
       Marketing Leckie N

 See also

    awk(C), comm(C) and sort(C).

 Notes

    With default field separation, the collating sequence is that of sort -b.
    With -t, the sequence is that of a plain sort.

 Standards conformance

    join is conformant with:

    AT&T SVID Issue 2;
    and X/Open Portability Guide, Issue 3, 1989.
Museum

Related Articles