gprof: File Format
9.2 Profiling Data File Format
==============================
The old BSD-derived file format used for profile data does not contain a
magic cookie that allows to check whether a data file really is a
'gprof' file. Furthermore, it does not provide a version number, thus
rendering changes to the file format almost impossible. GNU 'gprof'
uses a new file format that provides these features. For backward
compatibility, GNU 'gprof' continues to support the old BSD-derived
format, but not all features are supported with it. For example,
basic-block execution counts cannot be accommodated by the old file
format.
The new file format is defined in header file 'gmon_out.h'. It
consists of a header containing the magic cookie and a version number,
as well as some spare bytes available for future extensions. All data
in a profile data file is in the native format of the target for which
the profile was collected. GNU 'gprof' adapts automatically to the
byte-order in use.
In the new file format, the header is followed by a sequence of
records. Currently, there are three different record types: histogram
records, call-graph arc records, and basic-block execution count
records. Each file can contain any number of each record type. When
reading a file, GNU 'gprof' will ensure records of the same type are
compatible with each other and compute the union of all records. For
example, for basic-block execution counts, the union is simply the sum
of all execution counts for each basic-block.
9.2.1 Histogram Records
-----------------------
Histogram records consist of a header that is followed by an array of
bins. The header contains the text-segment range that the histogram
spans, the size of the histogram in bytes (unlike in the old BSD format,
this does not include the size of the header), the rate of the profiling
clock, and the physical dimension that the bin counts represent after
being scaled by the profiling clock rate. The physical dimension is
specified in two parts: a long name of up to 15 characters and a single
character abbreviation. For example, a histogram representing real-time
would specify the long name as "seconds" and the abbreviation as "s".
This feature is useful for architectures that support performance
monitor hardware (which, fortunately, is becoming increasingly common).
For example, under DEC OSF/1, the "uprofile" command can be used to
produce a histogram of, say, instruction cache misses. In this case,
the dimension in the histogram header could be set to "i-cache misses"
and the abbreviation could be set to "1" (because it is simply a count,
not a physical dimension). Also, the profiling rate would have to be
set to 1 in this case.
Histogram bins are 16-bit numbers and each bin represent an equal
amount of text-space. For example, if the text-segment is one thousand
bytes long and if there are ten bins in the histogram, each bin
represents one hundred bytes.
9.2.2 Call-Graph Records
------------------------
Call-graph records have a format that is identical to the one used in
the BSD-derived file format. It consists of an arc in the call graph
and a count indicating the number of times the arc was traversed during
program execution. Arcs are specified by a pair of addresses: the first
must be within caller's function and the second must be within the
callee's function. When performing profiling at the function level,
these addresses can point anywhere within the respective function.
However, when profiling at the line-level, it is better if the addresses
are as close to the call-site/entry-point as possible. This will ensure
that the line-level call-graph is able to identify exactly which line of
source code performed calls to a function.
9.2.3 Basic-Block Execution Count Records
-----------------------------------------
Basic-block execution count records consist of a header followed by a
sequence of address/count pairs. The header simply specifies the length
of the sequence. In an address/count pair, the address identifies a
basic-block and the count specifies the number of times that basic-block
was executed. Any address within the basic-address can be used.