join(C) 06 January 1993 join(C) Name join - join two relations Syntax join [ options ] file1 file2 Description The join command prints to the standard output a join of the two rela- tions specified by the lines of file1 and file2. If file1 is a dash (-), the standard input is used. file1 and file2 must be sorted in increasing collating sequence (defined by the current locale; see locale(M)) on the fields on which they are to be joined, normally the first in each line. There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally consists of the common field, then the rest of the line from file1, then the rest of the line from file2. Fields are normally separated by blank, tab or newline. In this case, multiple separators count as one, and leading separators are discarded. These options are recognized: -a n In addition to the normal output, produces a line for each unpairable line in file n, where n is 1 or 2. -e s Replaces empty output fields by string s. -j n m Joins on the mth field of file n. If n is missing, uses the mth field in each file. -o list Each output line comprises the fields specified in list, each element of which has the form n.m where n is a file number and m is a field number. -t c Uses character c as a field separator. Every appearance of c in a line is significant. Examples An employer has two files of data about his employees. The first file, E_pers, contains the name, address, and payroll number of each employee. The second file, E_dept, contains the payroll number, position, salary, and department of each employee. The fields in each employee's record are separated by colons ``:'', and each record occupies one line of a file. The format of a record in the file E_pers would be: Name:Address:Payroll# Here are some example records that might be in the file E_pers: Stross C:33 Wingfield Court, Anytown, Pa 65000:133 Leckie N:17 Awberry Court, Ubique, Ca 18480:91 Dixon J:14 Tulip Gardens, Appledorn, NY 10010:231 Each record in the file E_dept would have the format: Payroll#:Position:Salary:Department The following are some example records that might appear in E_dept: 91:Manager:85000:Marketing 133:Clerk:22000:Accounts 231:Clerk:19500:Accounts The employer wants to know which employees earn more than $20,000. Firstly, the fields in the file E_pers are put in the order: Payroll#:Name:Address This ensures that the field that the files are to be joined on (payroll number) is the first field in both files. The reordering is performed using awk(C), and its output is piped through sort(C). The sample records from E_pers given above would become: 133:Stross C:33 Wingfield Court, Anytown, Pa 65000 231:Dixon J:14 Tulip Gardens, Appledorn, NY 10010 91:Leckie N:17 Awberry Court, Ubique, Ca 18480 Note that the records are sorted into order of payroll number according to the collating sequence (not increasing numeric order). The reordered and sorted file is named P_tmp: awk -F: '{print $3":"$1":"$2}' Epers | sort -t: +0 -1 > Ptmp Next, the records in the file E_dept are sorted on payroll number and the output is placed in D_tmp: sort -t: +0 -1 Edept > Dtmp The sorted files are now joined on their first fields (payroll number), and the output is filtered using awk. A final pass through sort puts the output in alphabetical order: join -t: -o 1.2 2.3 Ptmp Dtmp | awk -F: '$2>20000 {print $1" $"$2}' | sort This command prints the employee's name (the second field of P_tmp) and their salary (the third field of D_tmp) if it is more than $20000. For the sample records given above, the output would be: Leckie N $85000 Stross C $22000 The output from join can be directed to a file (E_joined): join -t: -o 1.1 1.2 1.3 2.2 2.3 2.4 Ptmp Dtmp > Ejoined Omitting the -o option causes all fields to be written to the output file, with the fields from file1 being followed by those from file2. How- ever, the field on which the files were joined is only included once. Thus the join command above could be entered as: join -t: Ptmp Dtmp > Ejoined The file E_joined may be processed using awk to extract any other infor- mation required. Here a list of all employees sorted by name within each department is output: cat Ejoined | awk -F: '{print $6" "$2}' | sort For the sample records used in the example above, the output would be: Accounts Dixon J Accounts Stross C Marketing Leckie N See also awk(C), comm(C) and sort(C). Notes With default field separation, the collating sequence is that of sort -b. With -t, the sequence is that of a plain sort. Standards conformance join is conformant with: AT&T SVID Issue 2; and X/Open Portability Guide, Issue 3, 1989.