gprof: File Format

 
 9.2 Profiling Data File Format
 ==============================
 
 The old BSD-derived file format used for profile data does not contain a
 magic cookie that allows to check whether a data file really is a
 'gprof' file.  Furthermore, it does not provide a version number, thus
 rendering changes to the file format almost impossible.  GNU 'gprof'
 uses a new file format that provides these features.  For backward
 compatibility, GNU 'gprof' continues to support the old BSD-derived
 format, but not all features are supported with it.  For example,
 basic-block execution counts cannot be accommodated by the old file
 format.
 
    The new file format is defined in header file 'gmon_out.h'.  It
 consists of a header containing the magic cookie and a version number,
 as well as some spare bytes available for future extensions.  All data
 in a profile data file is in the native format of the target for which
 the profile was collected.  GNU 'gprof' adapts automatically to the
 byte-order in use.
 
    In the new file format, the header is followed by a sequence of
 records.  Currently, there are three different record types: histogram
 records, call-graph arc records, and basic-block execution count
 records.  Each file can contain any number of each record type.  When
 reading a file, GNU 'gprof' will ensure records of the same type are
 compatible with each other and compute the union of all records.  For
 example, for basic-block execution counts, the union is simply the sum
 of all execution counts for each basic-block.
 
 9.2.1 Histogram Records
 -----------------------
 
 Histogram records consist of a header that is followed by an array of
 bins.  The header contains the text-segment range that the histogram
 spans, the size of the histogram in bytes (unlike in the old BSD format,
 this does not include the size of the header), the rate of the profiling
 clock, and the physical dimension that the bin counts represent after
 being scaled by the profiling clock rate.  The physical dimension is
 specified in two parts: a long name of up to 15 characters and a single
 character abbreviation.  For example, a histogram representing real-time
 would specify the long name as "seconds" and the abbreviation as "s".
 This feature is useful for architectures that support performance
 monitor hardware (which, fortunately, is becoming increasingly common).
 For example, under DEC OSF/1, the "uprofile" command can be used to
 produce a histogram of, say, instruction cache misses.  In this case,
 the dimension in the histogram header could be set to "i-cache misses"
 and the abbreviation could be set to "1" (because it is simply a count,
 not a physical dimension).  Also, the profiling rate would have to be
 set to 1 in this case.
 
    Histogram bins are 16-bit numbers and each bin represent an equal
 amount of text-space.  For example, if the text-segment is one thousand
 bytes long and if there are ten bins in the histogram, each bin
 represents one hundred bytes.
 
 9.2.2 Call-Graph Records
 ------------------------
 
 Call-graph records have a format that is identical to the one used in
 the BSD-derived file format.  It consists of an arc in the call graph
 and a count indicating the number of times the arc was traversed during
 program execution.  Arcs are specified by a pair of addresses: the first
 must be within caller's function and the second must be within the
 callee's function.  When performing profiling at the function level,
 these addresses can point anywhere within the respective function.
 However, when profiling at the line-level, it is better if the addresses
 are as close to the call-site/entry-point as possible.  This will ensure
 that the line-level call-graph is able to identify exactly which line of
 source code performed calls to a function.
 
 9.2.3 Basic-Block Execution Count Records
 -----------------------------------------
 
 Basic-block execution count records consist of a header followed by a
 sequence of address/count pairs.  The header simply specifies the length
 of the sequence.  In an address/count pair, the address identifies a
 basic-block and the count specifies the number of times that basic-block
 was executed.  Any address within the basic-address can be used.