MACH-O(5) — UNIX Programmer’s Manual
NAME
Mach-O − Mach-O assembler and link editor output
SYNOPSIS
#include <sys/loader.h>
#include <nlist.h>
#include <stab.h>
#include <reloc.h>
#include <symseg.h>
DESCRIPTION
The object files produced by the assembler and link editor are in Mach-O (Mach object) file format. The file name a.out is the default output file name of the assembler as(1) and the link editor ld(1) as is conventional with most UNIX-like compiler systems. The format of the object file however is not 4.3BSD a.out format as the name suggests, but rather Mach-O format. The link editor will make a.out executable if the resulting format is an executable type and there were no errors and no unresolved external references.
The complete description of a Mach-O file is given in a number of include files. The file <sys/loader.h> describes the headers, <nlist.h> describes the symbol table entries with <stab.h> supplementing it, <reloc.h> describes the relocation entries, and <symseg.h> describes the symbol segments as created by the compiler, cc(1), with the −gg option and modified by the link editor.
The actual instructions and data used by the program represented by a Mach-O file are the contents of its sections. Sections are grouped together in segments. Each section carries with it, in its header, the information as to which segment it belongs in. When a file type that is executable is created the sections are placed in their proper segment and all the segment headers are created and the segments themselves are padded out to the segment alignment (the target pagesize). If a section is in a file that isn’t an executable type, all sections are placed in one segment for compactness.
When the kernel executes a Mach-O file it maps in the object file’s segments, any segments from fixed virtural shared libraries that the object uses and creates the thread(s) for execution. Any part of the object file that is not part of a segment is not mapped in for execution and except for the headers is not needed to execute the file. These parts include the relocation entries, the symbol table and the symbol segments and are stripped with the −s option to ld(1) or strip(1).
The headers of a Mach-O file are made up of a single fixed-length structure and then followed by a variable number of variable length load commands. All the headers appear first in the file with the mach_header as the first one. The order of the headers after the mach_header and the order of the rest of the file is arbitrary. The structure of these headers as given in <sys/loader.h> are:
/∗
∗ The mach header appears at the very beginning of the object file.
∗/
struct mach_header {
unsigned longmagic;/∗ mach magic number identifier ∗/
cpu_type_tcputype;/∗ cpu specifier ∗/
cpu_subtype_tcpusubtype;/∗ machine specifier ∗/
unsigned longfiletype;/∗ type of file ∗/
unsigned longncmds;/∗ number of load commands ∗/
unsigned longsizeofcmds;/∗ the size of all the load commands ∗/
unsigned longflags;/∗ flags ∗/
};
/∗ Constant for the magic field of the mach_header ∗/
#defineMH_MAGIC0xfeedface/∗ the mach magic number ∗/
/∗
∗ The layout of the file depends on the filetype. For MH_EXECUTE and
∗ MH_FVMLIB file types the segments are padded out and aligned on a
∗ segment alignment boundary for efficient demand pageing. Both of these
∗ file types also have the headers included as part of their first segment.
∗
∗ The file type MH_OBJECT is a compact non-executable format intended only
∗ as output of the assembler and input (and possibly output) of the link editor
∗ (the .o format). All sections are in one unnamed segment with no padding.
∗
∗ The file type MH_PRELOAD is an executable format intended for things that
∗ not executed under the kernel (proms, stand alones, kernels, etc). The
∗ format can be executed under the kernel but maybe demand paged and not
∗ preloaded before execution.
∗
∗ A core file is in MH_CORE format and can be any in an arbritray leagal
∗ Mach-O file.
∗
∗ Constants for the filetype field of the mach_header ∗/
∗/
#defineMH_OBJECT0x1/∗ relocatable object file ∗/
#defineMH_EXECUTE0x2/∗ demand paged executable file ∗/
#defineMH_FVMLIB0x3/∗ fixed vm shared library file ∗/
#defineMH_CORE0x4/∗ core file ∗/
#defineMH_PRELOAD0x5/∗ preloaded executable file ∗/
/∗ Constants for the flags field of the mach_header ∗/
#defineMH_NOUNDEFS0x1/∗ the object file has no undefined references,
can be executed ∗/
/∗
∗ The load commands directly follow the mach_header. The total size of all
∗ of the commands is given by the sizeofcmds field in the mach_header. All
∗ load commands must have as their first two fields cmd and cmdsize. The cmd
∗ field is filled in with a constant for that command type. Each command type
∗ has a structure specifically for it. The cmdsize field is the size in bytes
∗ of the particular load command structure plus anything that follows it that
∗ is a part of the load command (i.e. section structures, strings, etc.). To
∗ advance to the next load command the cmdsize can be added to the offset or
∗ pointer of the current load command. The cmdsize MUST be a multiple of
∗ sizeof(long) (this is forever the maximum alignment of any load commands).
∗ The padded bytes must be zero. All tables in the object file must also
∗ follow these rules so the file can be memory mapped. Otherwise the pointers
∗ to these tables will not work well or at all on some machines. With all
∗ padding zeroed like objects will compare byte for byte.
∗/
struct load_command {
unsigned longcmd;/∗ type of load command ∗/
unsigned longcmdsize;/∗ total size of command in bytes ∗/
};
/∗ Constants for the cmd field of all load commands, the type ∗/
#defineLC_SEGMENT0x1/∗ segment of this file to be mapped ∗/
#defineLC_SYMTAB0x2/∗ link-edit stab symbol table info ∗/
#defineLC_SYMSEG0x3/∗ link-edit gdb symbol table info ∗/
#defineLC_THREAD0x4/∗ thread ∗/
#defineLC_UNIXTHREAD0x5/∗ unix thread (includes a stack) ∗/
#defineLC_LOADFVMLIB0x6/∗ load a specified fixed vm shared library ∗/
#defineLC_IDFVMLIB0x7/∗ fixed vm shared library identification ∗/
#defineLC_IDENT0x8/∗ object identification information ∗/
/∗
∗ A variable length string in a load command is represented by an lc_str
∗ union. The strings are stored just after the load command structure and
∗ the offset is from the start of the load command structure. The size
∗ of the string is reflected in the cmdsize field of the load command.
∗ Once again any padded bytes to bring the cmdsize field to a multiple
∗ of sizeof(long) must be zero.
∗/
union lc_str {
unsigned longoffset;/∗ offset to the string ∗/
char∗ptr;/∗ pointer to the string ∗/
};
/∗
∗ The segment load command indicates that a part of this file is to be
∗ mapped into the task’s address space. The size of this segment in memory,
∗ vmsize, maybe equal to or larger than the amount to map from this file,
∗ filesize. The file is mapped starting at fileoff to the beginning of
∗ the segment in memory, vmaddr. The rest of the memory of the segment,
∗ if any, is allocated zero fill on demand. The segment’s maximum virtual
∗ memory protection and initial virtual memory protection are specified
∗ by the maxprot and initprot fields. If the segment has sections then the
∗ section structures directly follow the segment command and their size is
∗ reflected in cmdsize.
∗/
struct segment_command {
unsigned longcmd;/∗ LC_SEGMENT ∗/
unsigned longcmdsize;/∗ includes sizeof section structures ∗/
charsegname[16];/∗ segment name ∗/
unsigned longvmaddr;/∗ memory address of this segment ∗/
unsigned longvmsize;/∗ memory size of this segment ∗/
unsigned longfileoff;/∗ file offset of this segment ∗/
unsigned longfilesize;/∗ amount to map from the file ∗/
vm_prot_tmaxprot;/∗ maximum VM protection ∗/
vm_prot_tinitprot;/∗ initial VM protection ∗/
unsigned longnsects;/∗ number of sections in segment ∗/
unsigned longflags;/∗ flags ∗/
};
/∗ Constants for the flags field of the segment_command ∗/
#defineSG_HIGHVM0x1/∗ the file contents for this segment is for
the high part of the vm space, the low part
is zero filled (for stacks in core files) ∗/
#defineSG_FVMLIB0x2/∗ this segment is the vm that is allocated by
a fixed vm library, for overlap checking in
the link editor ∗/
#defineSG_NORELOC0x4/∗ this segment has nothing that was relocated
in it and nothing relocated to it, that is
it maybe safely replaced without relocation ∗/
/∗
∗ A segment is made up of zero or more sections. Only executables files have
∗ all of their segments with the proper sections in each, and demand paged
∗ executables have their segments padded to the specified segment alignment
∗ when produced by the link editor. The first segment of a demand paged
∗ executable file always contains the mach_header and load commands of the
∗ object file before it’s first section. The zero fill sections are always
∗ last in their segment. This allows the zeroed segment padding to be
∗ mapped into memory where zero fill sections might be.
∗
∗ Relocatable files have all of their sections in one segment for compactness.
∗ There is no padding to a specified segment boundary and the mach_header and
∗ load commands are not part of the segment.
∗
∗ Sections with the same section name, sectname, going into the same segment,
∗ segname, are combined by the link editor. The resulting section is aligned
∗ to the maximum alignment of the combined sections and is the new section’s
∗ alignment. The combined sections are aligned to their original alignment in
∗ the combined section. Any padded bytes to get the specified alignment are
∗ zeroed.
∗
∗ The format of the relocation entries referenced by the reloff and nreloc
∗ fields of the section structure for mach object files is described in the
∗ header file <reloc.h>.
∗/
struct section {
charsectname[16];/∗ name of this section ∗/
charsegname[16];/∗ segment this section goes in ∗/
unsigned longaddr;/∗ memory address of this section ∗/
unsigned longsize;/∗ size in bytes of this section ∗/
unsigned longoffset;/∗ file offset of this section ∗/
unsigned longalign;/∗ alignment of this section ∗/
unsigned longreloff;/∗ file offset of relocation entries ∗/
unsigned longnreloc;/∗ number of relocation entries ∗/
unsigned longflags;/∗ flags (i.e. zero fill section) ∗/
unsigned longreserved1;/∗ reserved ∗/
unsigned longreserved2;/∗ reserved ∗/
};
/∗ Constants for the flags field of a section structure ∗/
#defineS_ZEROFILL0x1/∗ zero fill on demand section ∗/
#defineS_CSTRING_LITERALS0x2/∗ section with only literal C strings ∗/
/∗
∗ The names of segments and sections in them are mostly meaningless to the
∗ link-editor. But there are few things to support traditional UNIX
∗ executables that require the link-editor and assembler to use some names
∗ agreed upon by convention.
∗
∗ The initial protection of the "__TEXT" segment has write protection turned
∗ off (not writeable).
∗
∗ The link-editor defined symbols __etext, __edata and __end will be defined as
∗ follows: __etext is that first address after the last non-zero fill section
∗ in the "__TEXT" segment. __edata is the first address after the last non-zero
∗ fill section in the "__DATA" segment. And __end is the first address after
∗ the last section in the "__DATA" segment.
∗
∗ The link-editor will allocate common symbols at the end of the "__bss"
∗ section in the "__DATA" segment. It will create the section and segment
∗ if needed.
∗/
/∗ Currently known segment names and the section names in those segments ∗/
#defineSEG_PAGEZERO"__PAGEZERO"/∗ the pagezero segment which has no
protections and catches NULL
references for MH_EXECUTE files ∗/
#defineSEG_TEXT"__TEXT"/∗ the tradition UNIX text segment ∗/
#defineSECT_TEXT"__text"/∗ the real text part of the text
section no headers, and no padding ∗/
#defineSEG_DATA"__DATA"/∗ the tradition UNIX data segment ∗/
#defineSECT_DATA"__data"/∗ the real initialized data section
no padding, no bss overlap ∗/
#defineSECT_BSS"__bss"/∗ the real uninitialized data section no
padding ∗/
#defineSEG_OBJC"__OBJC"/∗ objective-C runtime segment ∗/
#defineSECT_OBJC_SYMBOLS"__symbol_table"/∗ symbol table ∗/
#defineSECT_OBJC_MODULES"__module_info"/∗ module information ∗/
#defineSECT_OBJC_STRINGS"__selector_strs"/∗ string table ∗/
#defineSEG_ICON"__ICON"/∗ the NeXT icon segment ∗/
#defineSECT_ICON_HEADER"__header"/∗ the icon headers ∗/
#defineSECT_ICON_TIFF"__tiff"/∗ the icons in tiff format ∗/
#defineSEG_ARCHIVE"__ARCHIVE"/∗ the IB objective-C archive segment∗/
#defineSECT_ARCHIVE"__archive"/∗ the real IB objective-C archive
section no padding ∗/
#defineSEG_LINKEDIT"__LINKEDIT"/∗ the segment containing all structures
created and maintained by the link
editor. Created with -seglinkedit
option to ld(1) for MH_EXECUTE files
only ∗/
/∗
∗ Fixed virtual memory shared libraries are identified by two things. The
∗ target pathname (the name of the library as found for execution), and the
∗ minor version number. The address of where the headers are loaded is in
∗ header_addr.
∗/
struct fvmlib {
union lc_strname;/∗ library’s target pathname ∗/
unsigned longminor_version;/∗ library’s minor version number ∗/
unsigned longheader_addr;/∗ library’s header address ∗/
};
/∗
∗ A fixed virtual shared library (filetype == MH_FVMLIB in the mach header)
∗ contains a fvmlib_command (cmd == LC_IDFVMLIB) to identify the library.
∗ An object that uses a fixed virtual shared library also contains a
∗ fvmlib_command (cmd == LC_LOADFVMLIB) for each library it uses.
∗/
struct fvmlib_command {
unsigned longcmd;/∗ LC_IDFVMLIB or LC_LOADFVMLIB ∗/
unsigned longcmdsize;/∗ includes pathname string ∗/
struct fvmlibfvmlib;/∗ the library identification ∗/
};
/∗
∗ Thread commands contain machine-specific data structures suitable for
∗ use in the thread state primitives. The machine specific data structures
∗ follow the struct thread_command as follows.
∗ Each flavor of machine specific data structure is preceded by an unsigned
∗ long constant for the flavor of that data structure, an unsigned long
∗ that is the count of longs of the size of the state data structure and then
∗ the state data structure follows. This triple may be repeated for many
∗ flavors. The constants for the flavors, counts and state data structure
∗ definitions are expected to be in the header file <machine/thread_status.h>.
∗ These machine specific data structures sizes must be multiples of
∗ sizeof(long). The cmdsize reflects the total size of the thread_command
∗ and all of the sizes of the constants for the flavors, counts and state
∗ data structures.
∗
∗ For executable objects that are unix processes there will be one
∗ thread_command (cmd == LC_UNIXTHREAD) created for it by the link-editor.
∗ This is the same as a LC_THREAD, except that a stack is automatically
∗ created (based on the shell’s limit for the stack size). Command arguments
∗ and environment variables are copied onto that stack.
∗/
struct thread_command {
unsigned longcmd;/∗ LC_THREAD or LC_UNIXTHREAD ∗/
unsigned longcmdsize;/∗ total size of this command ∗/
/∗ unsigned longflavor flavor of thread state ∗/
/∗ unsigned longcount count of longs in thread state ∗/
/∗ struct XXX_thread_statestate thread state for this flavor ∗/
/∗ ... ∗/
};
/∗
∗ The symtab_command contains the offsets and sizes of the link-edit 4.3BSD
∗ "stab" style symbol table information as described in the header files
∗ <nlist.h> and <stab.h>.
∗/
struct symtab_command {
unsigned longcmd;/∗ LC_SYMTAB ∗/
unsigned longcmdsize;/∗ sizeof(struct symtab_command) ∗/
unsigned longsymoff;/∗ symbol table offset ∗/
unsigned longnsyms;/∗ number of symbol table entries ∗/
unsigned longstroff;/∗ string table offset ∗/
unsigned longstrsize;/∗ string table size in bytes ∗/
};
/∗
∗ The symseg_command contains the offset and size of the GNU style
∗ symbol table information as described in the header file <symseg.h>.
∗ The symbol roots of the symbol segments must also be aligned properly
∗ in the file. So the requirement of keeping the offsets aligned to a
∗ multiple of a sizeof(long) translates to the length field of the symbol
∗ roots also being a multiple of a long. Also the padding must again be
∗ zeroed.
∗/
struct symseg_command {
unsigned longcmd;/∗ LC_SYMSEG ∗/
unsigned longcmdsize;/∗ sizeof(struct symseg_command) ∗/
unsigned longoffset;/∗ symbol segment offset ∗/
unsigned longsize;/∗ symbol segment size in bytes ∗/
};
/∗
∗ The ident_command contains a free format string table following the
∗ ident_command structure. The strings are null terminated and the size of
∗ the command is padded out with zero bytes to a multiple of sizeof(long).
∗/
struct ident_command {
long cmd;/∗ LC_IDENT ∗/
long cmdsize;/∗ includes strings that follow this command ∗/
};
The symtab_command contains the offsets for the symbol table entries and string table used by those entries. The layout of a symbol table entry and the flag values that distinguish symbol types as given in the include file <nlist.h> are as follows:
/∗
∗ Format of a symbol table entry.
∗/
struct nlist {
union {
char∗n_name;/∗ for use when in-core ∗/
longn_strx;/∗ index into file string table ∗/
} n_un;
unsigned charn_type;/∗ type flag, see below ∗/
charn_sect;/∗ section number or NO_SECT ∗/
shortn_desc;/∗ see <stab.h> ∗/
unsignedn_value;/∗ value of this symbol (or stab offset) ∗/
};
/∗
∗ The n_type field really contains three fields:
∗unsigned charN_STAB:3,
∗N_TYPE:4,
∗N_EXT:1;
∗ which are used via the following masks.
∗/
#defineN_STAB0xe0 /∗ if any of these bits set, a symbolic debugging entry ∗/
#defineN_TYPE0x1e /∗ mask for the type bits ∗/
#defineN_EXT0x01 /∗ external symbol bit, set for external symbols ∗/
/∗
∗ Only symbolic debugging entries have some of the N_STAB bits set and if any
∗ of these bits are set then it is a symbolic debugging entry (a stab). In
∗ which case then the values of the n_type field (the entire field) are given
∗ in <stab.h>
∗/
/∗
∗ Values for N_TYPE bits of the n_type field.
∗/
#defineN_UNDF0x0/∗ undefined, n_sect == NO_SECT ∗/
#defineN_ABS0x2/∗ absolute, n_sect == NO_SECT ∗/
#defineN_SECT0xe/∗ defined in section number n_sect ∗/
#define N_INDR0xa/∗ indirect ∗/
/∗
∗ If the type is N_INDR then the symbol is defined to be the same as another
∗ symbol. In this case the n_value field is an index into the string table
∗ of the other symbol’s name. When the other symbol is defined then they both
∗ take on the defined type and value.
∗/
/∗
∗ If the type is N_SECT then the n_sect field contains an ordinal of the
∗ section the symbol is defined in. The sections are numbered from 1 and
∗ refer to sections in order they appear in the load commands for the file
∗ they are in. This means the same ordinal may very well refer to different
∗ sections in different files.
∗
∗ The n_value field for all symbol table entries (including N_STAB’s) gets
∗ updated by the link editor based on the value of it’s n_sect field and where
∗ the section n_sect references gets relocated. If the value of the n_sect
∗ field is NO_SECT then it’s n_value field is not changed by the link editor.
∗/
#defineNO_SECT0/∗ symbol is not in any section ∗/
#define MAX_SECT255/∗ 1 thru 255 inclusive ∗/
/∗
∗ Common symbols are represented by undefined (N_UNDF) external (N_EXT) types
∗ who’s values (n_value) are non-zero. In which case the value of the n_value
∗ field is the size (in bytes) of the common symbol. The n_sect field is set
∗ to NO_SECT.
∗/
In the file a symbol’s n_un.n_strx field gives an index into the string table. A n_strx value of 0 indicates that no name is associated with a particular symbol table entry. The field n_un.n_name can be used to refer to the symbol name only if the program sets this up using n_strx and appropriate data from the string table.
If a symbol’s type is undefined external, and the value field is non-zero, the symbol is interpreted by the loader ld as the name of a common region whose size is indicated by the value of the symbol.
The value of a byte in the text or data which is not a portion of a reference to an undefined external symbol is exactly that value which will appear in memory when the file is executed. If a byte in the text or data involves a reference to an undefined external symbol, as indicated by the relocation information, then the value stored in the file is an offset from the associated external symbol. When the file is processed by the link editor and the external symbol becomes defined, the value of the symbol will be added to the bytes in the file.
If relocation information is present, it amounts to eight bytes per relocatable datum. The structure of a relocation entry as given in the include file <reloc.h> is as follows:
/∗
∗ Format of a relocation datum.
∗/
struct relocation_info {
intr_address;/∗ offset in the section to what is being relocated ∗/
unsignedr_symbolnum:24,/∗ symbol index if r_extern == 1 or section
ordinal if r_extern == 0 ∗/
r_pcrel:1, /∗ was relocated pc relative already ∗/
r_length:2,/∗ 0=byte, 1=word, 2=long ∗/
r_extern:1,/∗ does not include value of sym referenced ∗/
r_reserved:4;/∗ reserved ∗/
};
#defineR_ABS0/∗ absolute relocation type for Mach-O files ∗/
The r_address is not really the address as it’s name indicates but an offset. For Mach-O object files this offset is from the start of the "section" for which the relocation entry is for.
If r_extern is zero then r_symbolnum is an ordinal for the section the symbol being relocated is in These ordinals refer to the sections in the object file in the order their section structures appear in the headers of the object file they are in. The first section has the ordinal 1, the second 2, and so on. This means that the same ordinal in two different object files could refer to two different sections. And further could have still different ordinals when combined by the link-editor. The value R_ABS is used for relocation entries for absolute symbols which need no further relocation.
SEE ALSO
as(1), ld(1), nm(1), gbx(1), stab(5), strip(1), end(3)
BUGS
There is no equivalent of an entry point in a relocatable object because it has no thread command.
NeXT, Inc. — June 20, 1989