octave: Profiler Example

 
 13.7 Profiler Example
 =====================
 
 Below, we will give a short example of a profiler session.  See
 Profiling, for the documentation of the profiler functions in detail.
 Consider the code:
 
      global N A;
 
      N = 300;
      A = rand (N, N);
 
      function xt = timesteps (steps, x0, expM)
        global N;
 
        if (steps == 0)
          xt = NA (N, 0);
        else
          xt = NA (N, steps);
          x1 = expM * x0;
          xt(:, 1) = x1;
          xt(:, 2 : end) = timesteps (steps - 1, x1, expM);
        endif
      endfunction
 
      function foo ()
        global N A;
 
        initial = @(x) sin (x);
        x0 = (initial (linspace (0, 2 * pi, N)))';
 
        expA = expm (A);
        xt = timesteps (100, x0, expA);
      endfunction
 
      function fib = bar (N)
        if (N <= 2)
          fib = 1;
        else
          fib = bar (N - 1) + bar (N - 2);
        endif
      endfunction
 
    If we execute the two main functions, we get:
 
      tic; foo; toc;
      ⇒ Elapsed time is 2.37338 seconds.
 
      tic; bar (20); toc;
      ⇒ Elapsed time is 2.04952 seconds.
 
    But this does not give much information about where this time is
 spent; for instance, whether the single call to ‘expm’ is more expensive
 or the recursive time-stepping itself.  To get a more detailed picture,
 we can use the profiler.
 
      profile on;
      foo;
      profile off;
 
      data = profile ("info");
      profshow (data, 10);
 
    This prints a table like:
 
         #  Function Attr     Time (s)        Calls
      ---------------------------------------------
         7      expm             1.034            1
         3  binary *             0.823          117
        41  binary \             0.188            1
        38  binary ^             0.126            2
        43 timesteps    R        0.111          101
        44        NA             0.029          101
        39  binary +             0.024            8
        34      norm             0.011            1
        40  binary -             0.004          101
        33   balance             0.003            1
 
    The entries are the individual functions which have been executed
 (only the 10 most important ones), together with some information for
 each of them.  The entries like ‘binary *’ denote operators, while other
 entries are ordinary functions.  They include both built-ins like ‘expm’
 and our own routines (for instance ‘timesteps’).  From this profile, we
 can immediately deduce that ‘expm’ uses up the largest proportion of the
 processing time, even though it is only called once.  The second
 expensive operation is the matrix-vector product in the routine
 ‘timesteps’.  (1)
 
    Timing, however, is not the only information available from the
 profile.  The attribute column shows us that ‘timesteps’ calls itself
 recursively.  This may not be that remarkable in this example (since
 it’s clear anyway), but could be helpful in a more complex setting.  As
 to the question of why is there a ‘binary \’ in the output, we can
 easily shed some light on that too.  Note that ‘data’ is a structure
 array (SeeStructure Arrays) which contains the field
 ‘FunctionTable’.  This stores the raw data for the profile shown.  The
 number in the first column of the table gives the index under which the
 shown function can be found there.  Looking up ‘data.FunctionTable(41)’
 gives:
 
        scalar structure containing the fields:
 
          FunctionName = binary \
          TotalTime =  0.18765
          NumCalls =  1
          IsRecursive = 0
          Parents =  7
          Children = [](1x0)
 
    Here we see the information from the table again, but have additional
 fields ‘Parents’ and ‘Children’.  Those are both arrays, which contain
 the indices of functions which have directly called the function in
 question (which is entry 7, ‘expm’, in this case) or been called by it
 (no functions).  Hence, the backslash operator has been used internally
 by ‘expm’.
 
    Now let’s take a look at ‘bar’.  For this, we start a fresh profiling
 session (‘profile on’ does this; the old data is removed before the
 profiler is restarted):
 
      profile on;
      bar (20);
      profile off;
 
      profshow (profile ("info"));
 
    This gives:
 
         #            Function Attr     Time (s)        Calls
      -------------------------------------------------------
         1                 bar    R        2.091        13529
         2           binary <=             0.062        13529
         3            binary -             0.042        13528
         4            binary +             0.023         6764
         5             profile             0.000            1
         8               false             0.000            1
         6              nargin             0.000            1
         7           binary !=             0.000            1
         9 __profiler_enable__             0.000            1
 
    Unsurprisingly, ‘bar’ is also recursive.  It has been called 13,529
 times in the course of recursively calculating the Fibonacci number in a
 suboptimal way, and most of the time was spent in ‘bar’ itself.
 
    Finally, let’s say we want to profile the execution of both ‘foo’ and
 ‘bar’ together.  Since we already have the run-time data collected for
 ‘bar’, we can restart the profiler without clearing the existing data
 and collect the missing statistics about ‘foo’.  This is done by:
 
      profile resume;
      foo;
      profile off;
 
      profshow (profile ("info"), 10);
 
    As you can see in the table below, now we have both profiles mixed
 together.
 
         #  Function Attr     Time (s)        Calls
      ---------------------------------------------
         1       bar    R        2.091        13529
        16      expm             1.122            1
        12  binary *             0.798          117
        46  binary \             0.185            1
        45  binary ^             0.124            2
        48 timesteps    R        0.115          101
         2 binary <=             0.062        13529
         3  binary -             0.045        13629
         4  binary +             0.041         6772
        49        NA             0.036          101
 
    ---------- Footnotes ----------
 
    (1) We only know it is the binary multiplication operator, but
 fortunately this operator appears only at one place in the code and thus
 we know which occurrence takes so much time.  If there were multiple
 places, we would have to use the hierarchical profile to find out the
 exact place which uses up the time which is not covered in this example.