lfs(5) lfs(5)
NAME
lfs - Large File Summit
DESCRIPTION
The problem with large files (> 2 GB)
The 32-bit-oriented UNIX systems support file sizes of at most 2^31-1
bytes. This restriction results from the definition of the "off_t"
data type as "signed 32-bit long". However, this is no longer suffi-
cient for present day applications which contain video, audio and
image files as well as large databases. While the present 32-bit sys-
tems can easily handle the computing aspects of application process-
ing, they also have to be able to support maximum file sizes which are
much larger than traditional file sizes ("Large File Support", LFS).
The problem can be solved with a full portation from 32 to 64 bit
where, for example, the data type definition for "off_t" is changed to
a "signed 64-bit long".
This type of portation would affect both the kernel as well as all
applications, and would involve a serious amount of effort. A 32-bit
UNIX system which specifically supports file sizes > 2 GB or a 64-bit
UNIX system, which "by definition" can handle very large file sizes,
has to take account of the following factors:
- existing 32-bit binary programs cannot be changed,
- existing application software will, if necessary, be ported gradu-
ally from 32 to 64 bit (portation as 64-bit program)
- existing application software will only be ported to 64 bit in
terms of file processing because of the effort involved (the appli-
cation will remain a 32-bit program and will simply use the LFS
interfaces).
Furthermore, interoperability between 32-bit and 64-bit applications
must be guaranteed.
Leading system suppliers and vendors have come together in the frame-
work of the "Large File Summit" to come up with a series of amendments
to the existing Single UNIX Specification (SUS) on the basis of which
both new and old programs can access files of "any" size. These amend-
ments will be incorporated in the next SUS from X/Open. Further
details of the "Large File Summit" and the LFS specification can be
found at http://www.sas.com/standards/large.file.
Requirements
The following requirements were considered in the LFS specification:
- Protection of existing programs (in relation to overflow of data
defined as offt, e. g. stsize in struct stat)
Page 1 Reliant UNIX 5.44 Printed 11/98
lfs(5) lfs(5)
- Protection of large files against programs that can only process
files of up to 2 GB (and which would endanger the data of a large
file through incorrect inputs)
- Access to files of far in excess of 2 GB on 32-bit operating sys-
tems or 32-bit applications on 64-bit operating systems
- Full compliance with SUS
- Extension of SUS
Concepts
The consistent implementation of a small number of key technical con-
cepts served as a "guide" for putting together the LFS specification:
Various offt sizes
During a transition period moving from pure 32-bit to pure 64-bit
systems and applications, two sizes of offt (and derived data
types) must be supported:
- in the case of systems with a mixed offt environment (offt
can be 32-bit or 64-bit long in any binary program),
- in the case of systems with a selectable offt environment
(offt is either 32-bit or 64-bit long in any binary program),
- in the case of networks, client-servers can have different
offt definitions.
Offset maximum
Most but unfortunately not all numeric data in the SUS is defined
as "non-transparent" (opaque) data types (i. e. the SUS generally
does not use the basic data types of the C language; ulimit(2) is
an exception here, for example, which means that the ulimit sys-
tem call is not dealt with in the LFS specification). These
derived data types can be used instead of the basic data types of
the C language in order to produce a portable, source-compatible
interface to the operating system and thus avoid the aforemen-
tioned overflow problems, for example. In the case of the linked
binary programs, however, the "non-transparent" datatypes must be
mapped to basic types (e. g. to 32-bit long). In the case of
data, for example, that is used to display the file size or the
current file position, an overflow can arise in the value range
if the file to be processed is larger than 2 GB.
To protect these pure 32-bit binary programs and to protect the
data in a large file, the concept of the "offset maximum" has
been introduced: When a file is opened [e. g. with open(2) or
creat(2)] the application program indicates (implicitly with 64-
bit programs or explicitly using a flag) whether it can process
files in excess of 2 GB. In this case, the so-called "offset max-
imum" is placed internally in the kernel in the file description
Page 2 Reliant UNIX 5.44 Printed 11/98
lfs(5) lfs(5)
linked with the file descriptor: the "offset maximum" corresponds
to the value for the maximum file position for positioning in the
file.
In the case of 32-bit programs, which do not specify with
open/creat that they can process a large file, an "offset max-
imum" of 2 GB-1 is used.
All operations of an application,
- whose "offset maximum" does not match the current file size
[open(2), creat(2) etc.]
- or which cannot display the current file size correctly
[stat(2), fstat(2), fcntl(2) etc.]
- or where it is attempted to position beyond the "offset max-
imum" [explicitly with lseek(2), implicitly with
write(2)/read(2)]
are aborted with errors.
EOVERFLOW
In the current SUS, an error is not defined for the situation
where the "offset maximum" is exceeded or where a date cannot be
displayed correctly. EOVERFLOW is an existing error type which
has to be included in the descriptions of the relevant system
interfaces so that the new error condition can be passed on to
the applications.
Development models
In addition to the interface definitions for using different
sizes for offt, development models are also considered in the
LFS specification which can be used to translate the source code
of existing applications which are to process files in excess of
2 GB.
It is necessary here to specify the size of offt as well as all
assigned data types:
Selectable offt
In this model, the sizes of offt valid for compilation are
defined and the corresponding libraries, include files and
data types are selected during the compiling and linking
process. All binary programs involved use the same size for
offt as provided by the development environment as 32-bit
or 64-bit long (e. g. as data type "long long", if the
application is compiled as a 32-bit program with LFS).
Page 3 Reliant UNIX 5.44 Printed 11/98
lfs(5) lfs(5)
Explicit offt
In this model, the sizes of offt are specified during
application development (and thereby established in the
source code). The explicitly specified system interface uses
an offt of a specified length.
In a 32-bit application, the use of open() implies, for
example, an offt of 32 bits, and the use of open64() or
open(OLARGEFILE) implies an off64t of 64 bits. (This also
applies for using other explicit 64-bit versions, such as
lseek64(), see the listing under SEE ALSO.) This model is
also extremely useful for supporting gradual conversions but
is not supported directly in the SUS. It simply represents a
temporary extension of the SUS (see LFS specification,
chapter 3).
NOTES
Files larger than 2 GB and file systems larger than 4 GB are only sup-
ported for the file system type vxfs (Veritas).
SEE ALSO
creat64(2), fstat64(2), fstatvfs64(2), getrlimit64(2), lstat64(2),
mmap64(2), open64(2), setrlimit64(2), stat64(2), statvfs64(2),
fgetpos64(3C), fsetpos64(3C), ftruncate64(3C), ftw64(3C), lockf64(3C),
nftw64(3C), readdir64(3C), truncate64(3C), fopen64(3S), freopen64(3S),
fseeko64(3S), ftello64(3S), lseek64(3S), tmpfile64(3S).
Page 4 Reliant UNIX 5.44 Printed 11/98