gprof: Line-by-line
5.3 Line-by-line Profiling
==========================
'gprof''s '-l' option causes the program to perform "line-by-line"
profiling. In this mode, histogram samples are assigned not to
functions, but to individual lines of source code. This only works with
programs compiled with older versions of the 'gcc' compiler. Newer
versions of 'gcc' use a different program - 'gcov' - to display
line-by-line profiling information.
With the older versions of 'gcc' the program usually has to be
compiled with a '-g' option, in addition to '-pg', in order to generate
debugging symbols for tracking source code lines. Note, in much older
versions of 'gcc' the program had to be compiled with the '-a'
command-line option as well.
The flat profile is the most useful output table in line-by-line
mode. The call graph isn't as useful as normal, since the current
version of 'gprof' does not propagate call graph arcs from source code
lines to the enclosing function. The call graph does, however, show
each line of code that called each function, along with a count.
Here is a section of 'gprof''s output, without line-by-line
profiling. Note that 'ct_init' accounted for four histogram hits, and
13327 calls to 'init_block'.
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
30.77 0.13 0.04 6335 6.31 6.31 ct_init
Call graph (explanation follows)
granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds
index % time self children called name
0.00 0.00 1/13496 name_too_long
0.00 0.00 40/13496 deflate
0.00 0.00 128/13496 deflate_fast
0.00 0.00 13327/13496 ct_init
[7] 0.0 0.00 0.00 13496 init_block
Now let's look at some of 'gprof''s output from the same program run,
this time with line-by-line profiling enabled. Note that 'ct_init''s
four histogram hits are broken down into four lines of source code--one
hit occurred on each of lines 349, 351, 382 and 385. In the call graph,
note how 'ct_init''s 13327 calls to 'init_block' are broken down into
one call from line 396, 3071 calls from line 384, 3730 calls from line
385, and 6525 calls from 387.
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self
time seconds seconds calls name
7.69 0.10 0.01 ct_init (trees.c:349)
7.69 0.11 0.01 ct_init (trees.c:351)
7.69 0.12 0.01 ct_init (trees.c:382)
7.69 0.13 0.01 ct_init (trees.c:385)
Call graph (explanation follows)
granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds
% time self children called name
0.00 0.00 1/13496 name_too_long (gzip.c:1440)
0.00 0.00 1/13496 deflate (deflate.c:763)
0.00 0.00 1/13496 ct_init (trees.c:396)
0.00 0.00 2/13496 deflate (deflate.c:727)
0.00 0.00 4/13496 deflate (deflate.c:686)
0.00 0.00 5/13496 deflate (deflate.c:675)
0.00 0.00 12/13496 deflate (deflate.c:679)
0.00 0.00 16/13496 deflate (deflate.c:730)
0.00 0.00 128/13496 deflate_fast (deflate.c:654)
0.00 0.00 3071/13496 ct_init (trees.c:384)
0.00 0.00 3730/13496 ct_init (trees.c:385)
0.00 0.00 6525/13496 ct_init (trees.c:387)
[6] 0.0 0.00 0.00 13496 init_block (trees.c:408)