Format of a.out files

A.out is the output file of the assembler as and the link editor ld. Linker makes a.out executable if there were no errors and no unresolved external references.

The file has four sections: a header, the program code and data, relocation information and a symbol table (in that order). The last two may be omitted if the program was loaded with the `-s' option of ld or if the symbols and relocation have been removed by strip(1).

Layout information as given in the include file <a.out.h> for PIC32 is:

  /*
   * Header prepended to each a.out file.
   */
  struct  exec {
      unsigned a_midmag;      /* magic number */
      unsigned a_text;        /* size of code segment */
      unsigned a_data;        /* size of initialized data */
      unsigned a_bss;         /* size of uninitialized data */
      unsigned a_reltext;     /* size of text relocation info */
      unsigned a_reldata;     /* size of data relocation info */
      unsigned a_syms;        /* size of symbol table */
      unsigned a_entry;       /* entry address */
  };
  #define RMAGIC      0406    /* relocatable object file */
  #define OMAGIC      0407    /* old impure format */

In the header the sizes of each section are given in bytes, but are word aligned. The size of the header is not included in any of the other sizes.

When an a.out file is executed, three logical segments are set up: the code (text) segment, the data segment (with uninitialized data, which starts off as all 0, following initialized), and a stack. The text segment begins at address 0x7f008000 in memory; the header is not loaded.

If the magic number in the header is OMAGIC (0407), it indicates that the text segment is not to be write-protected and shared, so the data segment is immediately contiguous with the text segment. This is the oldest kind of executable program and is the default.

The stack segment will occupy the highest possible locations in the core image: growing downwards from 0x7f01fffc. The stack segment is automatically extended as required. The data segment is only extended as requested by brk(2).

Relocation information

Relocation information is present, only if the magic number in the header is RMAGIC (0406). For every word of program text or initialized data, the relocation section contains a record of variable length from 1 to 6 bytes. Bytes 2-4 are present only when the relocation refers to an external symbol (xxx=6). Bytes 5-6 are present only for the upper-address relocation types (zzz=2 or zzz=3).

Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6
Descriptor Symbol index Lower address

Byte 1 of a relocation record contains a descriptor of format:

7 6 5 4 3 2 1 0
0 xxx y zzz

Bits 6:4 (xxx) of relocation descriptor indicate the segment referred to by the text or data word associated with the relocation record:

xxx Description
0 Absolute number
2 Reference to text segment
3 Reference to initialized data
4 Reference to uninitialized data (bss)
7 Reference to an external symbol with index specified by bytes 2,3,4

Bit 3 (y) of the relocation descriptor indicates, if 1, that the reference is relative to the GP register.

Bits 2:0 (zzz) of the relocation descriptor define a relocation type, or a method of tranforming the text or data word:

zzz Description
0 Byte address, 16 bits
1 Byte address, 32 bits
2 Upper part of byte address [31:16]
3 Upper part of byte address with signed offset
4 Word address [17:2]
5 Word address [27:2]

The value of a word in the text or data which is not a portion of a reference to an undefined external symbol is exactly that value which will appear in memory when the file is executed. If a word in the text or data involves a reference to an undefined external symbol, as indicated by the relocation information, then the value stored in the file is an offset from the associated external symbol. When the file is processed by the link editor and the external symbol becomes defined, the value of the symbol will be added into the word in the file.

Symbol table

The symbol table is a sequence of variable-length records for every symbol. The first symbol is numbered 0, the second 1, etc.

Byte # Description
0 Name length
1 Symbol type
2…5 Symbol value
6…N Symbol name

Byte 0 specifies a length of the symbol name in bytes (2…255), including the terminating zero byte.

Byte 1 indicates a type of the symbol - see below.

Bytes 2…5 store a symbol value (little endian).

Bytes 6…N contain a symbol name, null terminated.

Symbol type

7 6 5 4 3 2 1 0
0 W G ttttt
  • W - weak reference
  • G - global, or external symbol
  • ttttt - symbol type:
ttttt Description
0 Undefined symbol
1 Absolute
2 Text segment
3 Data segment
4 BSS segment
31 File name

If a symbol's type is undefined external, and the value field is nonzero, the symbol is interpreted by the loader ld as the name of a common region whose size is indicated by the value of the symbol.