tar(4)
NAME
tar − tape archive file format
DESCRIPTION
tar, (the tape archive command) dumps several files into one, in a medium suitable for transportation.
A “tar tape” or file is a series of blocks. Each block is of size TBLOCK. A file on the tape is represented by a header block which describes the file, followed by zero or more blocks which give the contents of the file. At the end of the tape are two blocks filled with binary zeros, as an end-of-file indicator.
The blocks are grouped for physical I/O operations. Each group of n blocks (where n is set by the b keyletter on the tar(1) command line — default is 20 blocks) is written with a single system call; on nine-track tapes, the result of this write is a single tape record. The last group is always written at the full size, so blocks after the two zero blocks contain random data. On reading, the specified or default group size is used for the first read, but if that read returns less than a full tape block, the reduced block size is used for further reads.
CX/UX tar(1) has been modified by the POSIX 1003.1 standard to extend its archiving capabilities to include new file types. Extended tar supports the archiving of regular, "hard" linked, symbolic linked, directory, FIFO, character special, and block special file types. In order to provide this support, additional fields have been added to the header block structure.
The header block looks like:
#define TBLOCK 512
#define NAMSIZ 100
#define TMAGLEN 6
#define TVERSLEN 2
union hblock {
char dummy[TBLOCK];
struct header {
char name[NAMSIZ];
char mode[8];
char uid[8];
char gid[8];
char size[12];
char mtime[12];
char chksum[8];
char typeflag;
char linkname[NAMSIZ];
/∗ POSIX extensions ∗/
char magic[TMAGLEN];
char version[TVERSLEN];
char uname[32];
char gname[32];
char devmajor[8];
char devminor[8];
char prefix[155];
/∗ tar multi-volume mode extensions ∗/
char extno;
char extotal;
char efsize[10];
} dbuf;
};
In the POSIX extended tar format, the magic, uname, and gname fields contain null-terminated strings. The name, linkname, and prefix fields also contain null-terminated strings except when all characters in the string contain non-null characters including the last character. The other fields are zero-filled octal numbers in ASCII. Each field (of width w) contains w-1 digits and a terminating null. name is the name of the file, as specified on the tar command line. If the prefix contains non-null characters, prefix, a slash character ("/"), and name are concatenated without modification or addition of new characters to produce the complete pathname which can total upto 256 characters in length. If the complete pathname is less than or equal to 100 characters in length, extended tar will place the complete pathname in the name field. Otherwise, the complete pathname will be split up with the components placed into the respective fields. mode specifies the file mode, with the top bit masked off. 9 bits specify the file permission and 3 bits specify the UID, GID, and TSVTX modes. These modes are described by symbolic constants in the include file <tar.h>. uid and gid are the user and group numbers which own the file. size is the size of the file in bytes. Extended tar dumps links, symbolic links, directories, FIFOs, character special files, and block special files with this field specified as zero. No data blocks are stored on the medium when FIFO, character special, block special, link, symbolic link, and directory file types are archived. mtime is the modification time of the file at the time it was dumped. chksum is an ASCII representation of the octal value which represents the sum of all the bytes in the header block. When calculating the checksum, the chksum field is treated as if it were all blanks. typeflag specifies the type of file archived. Extended tar uses the values ASCII ‘0’ or binary zero to represent a regular file, ASCII ‘1’ to represent a "hard" linked file, ASCII ‘2’ to represent a symbolic linked file, ASCII ‘3’ to represent a character special device, an ASCII ‘4’ to represent a block special device, an ASCII ‘5’ to represent a directory, or an ASCII ‘6’ to represent a FIFO special file. If the file type is a link or symbolic link, the name linked-to, if any, is in linkname. Extended tar does not use the prefix field in conjunction with this field to produce a pathname.
The following fields are extended tar additions to the header block. The magic field specifies that the archive was output in the extended tar format. It contains the null terminated string "ustar." The version field contains the string "00" and is not null terminated. If the magic field contains "ustar," the uname and gname fields contain the ASCII representation of the owner and group of the file respectively. When the file is restored by a priviledged user, the password and group files shall be scanned for these names. If found, the user and group IDs contained within these files shall be used rather than the values contained within the uid and gid fields. Whenever the typeflag field specifies a character special or block special file, the devmajor and devminor fields contain the ASCII representation of the octal values of the major and minor numbers of the device represented. These numbers are obtained from the st_rdev field of the stat structure [See stat(2)].
As specified by POSIX 1003.1, unused fields of the header are binary zeros (and are included in the checksum). However, tar archives utilizing the non-POSIX conforming multi-volume capability, will contain values in the extno, extotal, and efsize fields. The extno field will contain a value describing which extent of the file in question is contained on this tape as the file is split up for storage across multiple volumes. The extotal field will contain a value describing the total number of extents the file in question will require. The efsize field contains the total size of the file in bytes. The maximum number of extents permitted for a split file is nine.
The first time a given i-node number is dumped, it is dumped as a regular file. For second and subsequent times, it is dumped as a link instead. Upon retrieval, if a link entry is retrieved, but not the file it was linked to, an error message is printed and the tape must be manually re-scanned to retrieve the linked-to file.
The encoding of the header is designed to be portable across machines.
SEE ALSO
NOTES
File types other than those described above (POSIX 1003.1 high performance files of type ASCII ‘7’ and custom implementation files of types ASCII ‘A’-‘Z’) will be archived and extracted as regular files (type ‘0’).
CX/UX Programmer’s Reference Manual