bfd: Canonical format
1.3.2 The BFD canonical object-file format
------------------------------------------
The greatest potential for loss of information occurs when there is the
least overlap between the information provided by the source format,
that stored by the canonical format, and that needed by the destination
format. A brief description of the canonical form may help you
understand which kinds of data you can count on preserving across
conversions.
_files_
Information stored on a per-file basis includes target machine
architecture, particular implementation format type, a demand
pageable bit, and a write protected bit. Information like Unix
magic numbers is not stored here--only the magic numbers' meaning,
so a 'ZMAGIC' file would have both the demand pageable bit and the
write protected text bit set. The byte order of the target is
stored on a per-file basis, so that big- and little-endian object
files may be used with one another.
_sections_
Each section in the input file contains the name of the section,
the section's original address in the object file, size and
alignment information, various flags, and pointers into other BFD
data structures.
_symbols_
Each symbol contains a pointer to the information for the object
file which originally defined it, its name, its value, and various
flag bits. When a BFD back end reads in a symbol table, it
relocates all symbols to make them relative to the base of the
section where they were defined. Doing this ensures that each
symbol points to its containing section. Each symbol also has a
varying amount of hidden private data for the BFD back end. Since
the symbol points to the original file, the private data format for
that symbol is accessible. 'ld' can operate on a collection of
symbols of wildly different formats without problems.
Normal global and simple local symbols are maintained on output, so
an output file (no matter its format) will retain symbols pointing
to functions and to global, static, and common variables. Some
symbol information is not worth retaining; in 'a.out', type
information is stored in the symbol table as long symbol names.
This information would be useless to most COFF debuggers; the
linker has command-line switches to allow users to throw it away.
There is one word of type information within the symbol, so if the
format supports symbol type information within symbols (for
example, COFF, Oasys) and the type is simple enough to fit within
one word (nearly everything but aggregates), the information will
be preserved.
_relocation level_
Each canonical BFD relocation record contains a pointer to the
symbol to relocate to, the offset of the data to relocate, the
section the data is in, and a pointer to a relocation type
descriptor. Relocation is performed by passing messages through
the relocation type descriptor and the symbol pointer. Therefore,
relocations can be performed on output data using a relocation
method that is only available in one of the input formats. For
instance, Oasys provides a byte relocation format. A relocation
record requesting this relocation type would point indirectly to a
routine to perform this, so the relocation may be performed on a
byte being written to a 68k COFF file, even though 68k COFF has no
such relocation type.
_line numbers_
Object formats can contain, for debugging purposes, some form of
mapping between symbols, source line numbers, and addresses in the
output file. These addresses have to be relocated along with the
symbol information. Each symbol with an associated list of line
number records points to the first record of the list. The head of
a line number list consists of a pointer to the symbol, which
allows finding out the address of the function whose line number is
being described. The rest of the list is made up of pairs: offsets
into the section and line numbers. Any format which can simply
derive this information can pass it successfully between formats.