A.OUT(5) — FILE FORMATS
NAME
a.out − assembler and link editor output
SYNOPSIS
#include <a.out.h>
#include <stab.h>
#include <nlist.h>
DESCRIPTION
A.out is the output file of the assembler as(1) and the link editor ld(1). The link editor makes a.out executable if there were no errors and no unresolved external references. Layout information as given in the include file for the Sun system is:
/∗
∗ Header prepended to each a.out file.
∗/
struct exec {
longa_magic;/∗ magic number ∗/
unsigneda_text;/∗ size of text segment ∗/
unsigneda_data;/∗ size of initialized data ∗/
unsigneda_bss;/∗ size of uninitialized data ∗/
unsigneda_syms;/∗ size of symbol table ∗/
unsigneda_entry;/∗ entry point ∗/
unsigneda_trsize;/∗ size of text relocation ∗/
unsigneda_drsize;/∗ size of data relocation ∗/
};
#defineOMAGIC0407/∗ old impure format ∗/
#defineNMAGIC0410/∗ read-only text ∗/
#defineZMAGIC0413/∗ demand load format ∗/
#definePAGSIZ2048
#defineSEGSIZ0x8000
#defineTXTRELOC SEGSIZ
/∗
∗ Macros which take exec structures as arguments and tell whether
∗ the file has a reasonable magic number or offsets to text|symbols|strings.
∗/
#defineN_BADMAG(x) \
(((x).a_magic)!=OMAGIC && ((x).a_magic)!=NMAGIC && ((x).a_magic)!=ZMAGIC)
#defineN_TXTOFF(x) \
((x).a_magic==ZMAGIC ? PAGSIZ : sizeof (struct exec))
#define N_SYMOFF(x) \
(N_TXTOFF(x) + (x).a_text+(x).a_data + (x).a_trsize+(x).a_drsize)
#defineN_STROFF(x) \
(N_SYMOFF(x) + (x).a_syms)
/∗
∗ Macros which take exec structures as arguments and tell where the
∗ various pieces will be loaded.
∗/
#define N_TXTADDR(x) TXTRELOC
#define N_DATADDR(x) \
(((x).a_magic==OMAGIC)? (N_TXTADDR(x)+(x).a_text) \
: (SEGSIZ+((N_TXTADDR(x)+(x).a_text-1) & ~SEGRND)))
#define N_BSSADDR(x) (N_DATADDR(x)+(x).a_data)
The a.out file has five sections: a header, the program text and data, relocation information, a symbol table and a string table (in that order). The last three may be omitted if the program was loaded with the ‘−s’ option of ld or if the symbols and relocation have been removed by strip(1).
In the header the sizes of each section are given in bytes. The size of the header is not included in any of the other sizes.
When an a.out file is executed, three logical segments are set up: the text segment, the data segment (with uninitialized data, which starts off as all 0, following initialized data), and a stack. The header is not loaded with the text segment. If the magic number in the header is OMAGIC (0407), it means that this is a non-sharable text which is not to be write-protected, so the data segment is immediately contiguous with the text segment. This is rarely used. If the magic number is NMAGIC (0410) or ZMAGIC (0413), the data segment begins at the first segment boundary following the text segment, and the text segment is not writable by the program; other processes executing the same file will share the text segment. For ZMAGIC format, the text segment begins on a page boundary in the a.out file; the remaining bytes after the header in the first block are reserved and should be zero. In this case the text and data sizes must both be multiples of the page size, and the pages of the file will be brought into the running image as needed, and not pre-loaded as with the other formats. This is especially suitable for very large programs and is the default format produced by ld(1). The macros N_TXTADDR, N_DATADDR, and N_BSSADDR give the memory addresses at which the text, data, and bss segments, respectively, will be loaded.
The stack starts at the highest possible location in the memory image, and grows downwards. The stack is automatically extended as required. The data segment is extended as requested by brk(2) or sbrk(2).
After the header in the file follow the text, data, text relocation data relocation, symbol table and string table in that order. The text begins at byte PAGSIZ in the file for ZMAGIC format or just after the header for the other formats. The N_TXTOFF macro returns this absolute file position when given the name of an exec structure as argument. The data segment is contiguous with the text and immediately followed by the text relocation and then the data relocation information. The symbol table follows all this; its position is computed by the N_SYMOFF macro. Finally, the string table immediately follows the symbol table at a position which can be gotten easily using N_STROFF. The first 4 bytes of the string table are not used for string storage, but rather contain the size of the string table; this size includes the 4 bytes, the minimum string table size is thus 4.
RELOCATION
The value of a byte in the text or data which is not a portion of a reference to an undefined external symbol is exactly that value which will appear in memory when the file is executed. If a byte in the text or data involves a reference to an undefined external symbol, as indicated by the relocation information, then the value stored in the file is an offset from the associated external symbol. When the file is processed by the link editor and the external symbol becomes defined, the value of the symbol is added to the bytes in the file.
If relocation information is present, it amounts to eight bytes per relocatable datum as in the following structure:
/∗
∗ Format of a relocation datum.
∗/
struct relocation_info {
intr_address;/∗ address which is relocated ∗/
unsignedr_symbolnum:24,/∗ local symbol ordinal ∗/
r_pcrel:1, /∗ was relocated pc relative already ∗/
r_length:2,/∗ 0=byte, 1=word, 2=long ∗/
r_extern:1,/∗ does not include value of sym referenced ∗/
:4;/∗ nothing, yet ∗/
};
There is no relocation information if a_trsize+a_drsize==0. If r_extern is 0, then r_symbolnum is actually a n_type for the relocation (that is, N_TEXT meaning relative to segment text origin.)
SYMBOL TABLE
The layout of a symbol table entry and the principal flag values that distinguish symbol types are given in the include file as follows:
/∗
∗ Format of a symbol table entry.
∗/
struct nlist {
union {
char∗n_name;/∗ for use when in-memory ∗/
longn_strx;/∗ index into file string table ∗/
} n_un;
unsigned charn_type;/∗ type flag, that is, N_TEXT etc; see below ∗/
charn_other;
shortn_desc;/∗ see <stab.h> ∗/
unsignedn_value;/∗ value of this symbol (or adb offset) ∗/
};
#definen_hashn_desc/∗ used internally by ld ∗/
/∗
∗ Simple values for n_type.
∗/
#defineN_UNDF0x0/∗ undefined ∗/
#defineN_ABS0x2/∗ absolute ∗/
#defineN_TEXT0x4/∗ text ∗/
#defineN_DATA0x6/∗ data ∗/
#defineN_BSS0x8/∗ bss ∗/
#defineN_COMM0x12/∗ common (internal to ld) ∗/
#defineN_FN0x1f/∗ file name symbol ∗/
#defineN_EXT01/∗ external bit, or’ed in ∗/
#defineN_TYPE0x1e/∗ mask for all the type bits ∗/
/∗
∗ Other permanent symbol table entries have some of the
N_STAB
bits set.
∗ These are given in <stab.h>
∗/
#defineN_STAB0xe0/∗ if any of these bits set, don’t discard ∗/
In the a.out file a symbol’s n_un.n_strx field gives an index into the string table. A n_strx value of 0 indicates that no name is associated with a particular symbol table entry. The field n_un.n_name can be used to refer to the symbol name only if the program sets this up using n_strx and appropriate data from the string table. Because of the union in the nlist declaration, it is impossible in C to statically initialize such a structure. If this must be done (as when using nlist(3)) the file <nlist.h> should be included, rather that <a.out.h>; this contains the declaration without the union.
If a symbol’s type is undefined external, and the value field is non-zero, the symbol is interpreted by the loader ld as the name of a common region whose size is indicated by the value of the symbol.
STAB SYMBOLS
Stab.h defines some values of the n_type field of the symbol table of a.out files. These are the types for permanent symbols (that is, not local labels, etc.) used by the debuggers adb(1) and dbx(1) and the Pascal compiler pc(1). Symbol table entries can be produced by the .stabs assembler directive. This allows one to specify a double-quote delimited name, a symbol type, one char and one short of information about the symbol, and an unsigned long (usually an address). To avoid having to produce an explicit label for the address field, the .stabd directive can be used to implicitly address the current location. If no name is needed, symbol table entries can be generated using the .stabn directive. The loader promises to preserve the order of symbol table entries produced by .stab directives.
The n_value field of a symbol is relocated by the link editor as an address within the appropriate segment. N_value fields of symbols not in any segment are unchanged by the linker. In addition, the linker will discard certain symbols, according to rules of its own, unless the n_type field has one of the bits masked by N_STAB set.
This allows up to 112 (7 ∗ 16) symbol types, split between the various segments. Some of these have already been claimed. The debugger, adb(1), uses the following n_type values:
#defineN_GSYM0x20/∗ global symbol: name,,0,type,0 ∗/
#defineN_FNAME0x22/∗ procedure name (f77 kludge): name,,0 ∗/
#defineN_FUN0x24/∗ procedure: name,,0,linenumber,address ∗/
#defineN_STSYM0x26/∗ static symbol: name,,0,type,address ∗/
#defineN_LCSYM0x28/∗ .lcomm symbol: name,,0,type,address ∗/
#defineN_RSYM0x40/∗ register sym: name,,0,type,register ∗/
#defineN_SLINE0x44/∗ src line: 0,,0,linenumber,address ∗/
#defineN_SSYM0x60/∗ structure elt: name,,0,type,struct_offset ∗/
#defineN_SO0x64/∗ source file name: name,,0,0,address ∗/
#defineN_LSYM0x80/∗ local sym: name,,0,type,offset ∗/
#defineN_SOL0x84/∗ #included file name: name,,0,0,address ∗/
#defineN_PSYM0xa0/∗ parameter: name,,0,type,offset ∗/
#defineN_ENTRY0xa4/∗ alternate entry: name,linenumber,address ∗/
#defineN_LBRAC0xc0/∗ left bracket: 0,,0,nesting level,address ∗/
#defineN_RBRAC0xe0/∗ right bracket: 0,,0,nesting level,address ∗/
#defineN_BCOMM0xe2/∗ begin common: name,, ∗/
#defineN_ECOMM0xe4/∗ end common: name,, ∗/
#defineN_ECOML0xe8/∗ end common (local name): ,,address ∗/
#defineN_LENG0xfe/∗ second stab entry with length information ∗/
where the comments give the adb conventional use for .stabs and the n_name, n_other, n_desc, and n_value fields of the given n_type. Adb uses the n_desc field to hold a type specifier in the form used by the Portable C Compiler, cc(1), in which a base type is qualified in the following structure:
struct desc {
shortq6:2,
q5:2,
q4:2,
q3:2,
q2:2,
q1:2,
basic:4;
};
There are four qualifications, with q1 the most significant and q6 the least significant:
0none
1pointer
2function
3array
The sixteen basic types are assigned as follows:
0undefined
1function argument
2character
3short
4int
5long
6float
7double
8structure
9union
10enumeration
11member of enumeration
12unsigned character
13unsigned short
14unsigned int
15unsigned long
The Pascal compiler, pc(1), uses the following n_type value:
#defineN_PC0x30/∗ global pascal symbol: name,,0,subtype,line ∗/
and uses the following subtypes to do type checking across separately compiled files:
1source file name
2included file name
3global label
4global constant
5global type
6global variable
7global function
8global procedure
9external function
10external procedure
11library variable
12library routine
The debugger, dbx(1), uses the following n_type values. The comments give the dbx conventional use for .stabs and the n_name, n_other, n_desc, and n_value fields for the given n_type symbol entry.
#defineN_GSYM0x20/∗ global symbol: name,,0,size,0 ∗/
#defineN_FUN0x24/∗ procedure name: name,,0,size,address ∗/
#defineN_STSYM0x26/∗ static symbol: name,,0,size,address ∗/
#defineN_LCSYM0x28/∗ .lcomm symbol: name,,0,size,address ∗/
#defineN_RSYM0x40/∗ register sym: name,,0,size,register ∗/
#defineN_SLINE0x44/∗ src line: 0,,0,linenumber,address ∗/
#defineN_SO0x64/∗ source file name: name,,0,0,address ∗/
#defineN_LSYM0x80/∗ local sym: name,,0,size,offset ∗/
#defineN_SOL0x84/∗ #included file name: name,,0,0,address ∗/
#defineN_PSYM0xa0/∗ parameter: name,,0,size,offset ∗/
#defineN_BCOMM0xe2/∗ begin common: name,, ∗/
#defineN_ECOMM0xe4/∗ end common: name,, ∗/
Dbx does not use the n_type value to differentiate symbols. The information as to whether a symbol is local, global, a parameter, lives in a register, etc. is indicated within the n_name field. Dbx processes N_GSYM, N_FUN, N_STSYM, N_LCSYM, N_RSYM, N_PSYM, N_LSYM, N_SSYM, and N_LENG symbol entries identically.
Each of the basic types in a language is given a type number. The type of a symbol is defined in terms of the type numbers. Declarations which create new types, such as structure declarations, define additional type numbers. The name of a type, its type number and other pertinent information are put in the n_name field and parsed by dbx. For example, the line
.stabs "int:t1=r1;-2147483648;2147483647;",0x80,0,0,0
defines the type int and assigns it type number one. The lower and upper bounds of an int variable are given as -2147483648 and 2147483647, respectively.
The local variable
int i;
is described by
.stabs "i:1",0x80,0,4,-24
The type number is one, corresponding to an integer. It’s size is four bytes, and it’s address is −24 bytes from the stack pointer.
Structures and unions use the n_name field to describe the entire data structure. Each member is described including its type, offset, and size. The structure
struct xyz {
int mem1;
char mem2;
int mem3;
};
is described by
.stabs "xyz:T15=s10mem1:1,0,32;mem2:2,32,8;mem3:1,48,32;;",0x80,0,10,-1275
Reading the n_name field from left to right, the tag name is first followed by the type number. Thus, the xyz structure is assigned type number 15. The “=s10” indicates that a structure is being defined (substitute u for s to define a union) and it is ten bytes long. The description of the members follow next. The name of a member and it’s type are given as above — name:typeno. Next is the offset (in bits) to the start of the member, and the size (in bits) of the member. The member information is repeated for each member.
Enumerated types are described in a manner similar to structures. The enumerated type
enumcolor { RED, BLUE, YELLOW };
is described by
.stabs "color:T16=eRED:0,BLUE:1,YELLOW:2,;",0x80,0,4,-1275
The color enumeration is assigned type number 16 and the “=e” indicates that an enumerated type is being defined. The member information consists of the member’s name followed by the member’s ordinal value.
A type number used to indicate the type of a symbol may be preceded by a one character descriptor. The descriptors are:
With no descriptor, the symbol is taken to be local to the current routine.
rRegister variable.
GGlobal variable.
SStatic global variable. In C, this is a static global variable whose scope is the file it is defined in.
pParameter passed by value.
vParameter passed by reference. This includes var parameters in Pascal.
tType. Defines a new type.
∗Defines a pointer to a type.
TTag. Used for a structure, union or enum tag.
aArray.
fPrivate function. Corresponds to static functions in C and nested routines in Pascal.
FPublic functions.
VCommon or local static variable. Used for FORTRAN COMMON variables or local static variables in C.
xPascal conformant array value parameter.
XPascal or FORTRAN function variable.
CPascal conformant array dimension.
For example,
char ∗charstar;
is described by
.stabs "charstar:G18=∗2",0x20,0,1,0
The ’G’ indicates that charstar is a global variable. It’s type is number eighteen which is defined here to be a pointer to type number two which is a character. Therefore, charstar is a “char ∗”.
A function pointer parameter such as
frammis(funcp)
int (∗funcp)();
{ ... }
is described by
.stabs "funcp:p19=∗20=f1",0xa0,0,4,8
The ’p’ indicates that funcp is a parameter. The type information defines two new types, nineteen and twenty. Type twenty is a function returning type one (integer) and type nineteen is a pointer to type twenty so it is a pointer to a function returning an integer.
SEE ALSO
adb(1), as(1), cc(1), pc(1), ld(1), nm(1), dbx(1), strip(1)
BUGS
There are currently two interpretations of the stabs symbol−table information. This creates great confusion when trying to build a program for debugging.
Due to the amount of symbolic information necessary for high−level debugging, the whole a.out structure has been stretched well beyond its original design, and should be replaced by something with a more sophisticated symbol−table mechanism. The demands of future languages will only compound the problems.
Sun Release 2.0 — Last change: 9 November 1984