eintr: Counting function definitions

 
 14.9.3 Counting function definitions
 ------------------------------------
 
 Our immediate goal is to generate a list that tells us how many function
 definitions contain fewer than 10 words and symbols, how many contain
 between 10 and 19 words and symbols, how many contain between 20 and 29
 words and symbols, and so on.
 
    With a sorted list of numbers, this is easy: count how many elements
 of the list are smaller than 10, then, after moving past the numbers
 just counted, count how many are smaller than 20, then, after moving
 past the numbers just counted, count how many are smaller than 30, and
 so on.  Each of the numbers, 10, 20, 30, 40, and the like, is one larger
 than the top of that range.  We can call the list of such numbers the
 ‘top-of-ranges’ list.
 
    If we wished, we could generate this list automatically, but it is
 simpler to write a list manually.  Here it is:
 
      (defvar top-of-ranges
       '(10  20  30  40  50
         60  70  80  90 100
        110 120 130 140 150
        160 170 180 190 200
        210 220 230 240 250
        260 270 280 290 300)
       "List specifying ranges for `defuns-per-range'.")
 
    To change the ranges, we edit this list.
 
    Next, we need to write the function that creates the list of the
 number of definitions within each range.  Clearly, this function must
 take the ‘sorted-lengths’ and the ‘top-of-ranges’ lists as arguments.
 
    The ‘defuns-per-range’ function must do two things again and again:
 it must count the number of definitions within a range specified by the
 current top-of-range value; and it must shift to the next higher value
 in the ‘top-of-ranges’ list after counting the number of definitions in
 the current range.  Since each of these actions is repetitive, we can
 use ‘while’ loops for the job.  One loop counts the number of
 definitions in the range defined by the current top-of-range value, and
 the other loop selects each of the top-of-range values in turn.
 
    Several entries of the ‘sorted-lengths’ list are counted for each
 range; this means that the loop for the ‘sorted-lengths’ list will be
 inside the loop for the ‘top-of-ranges’ list, like a small gear inside a
 big gear.
 
    The inner loop counts the number of definitions within the range.  It
 is a simple counting loop of the type we have seen before.  (SeeA
 loop with an incrementing counter Incrementing Loop.)  The
 true-or-false test of the loop tests whether the value from the
 ‘sorted-lengths’ list is smaller than the current value of the top of
 the range.  If it is, the function increments the counter and tests the
 next value from the ‘sorted-lengths’ list.
 
    The inner loop looks like this:
 
      (while LENGTH-ELEMENT-SMALLER-THAN-TOP-OF-RANGE
        (setq number-within-range (1+ number-within-range))
        (setq sorted-lengths (cdr sorted-lengths)))
 
    The outer loop must start with the lowest value of the
 ‘top-of-ranges’ list, and then be set to each of the succeeding higher
 values in turn.  This can be done with a loop like this:
 
      (while top-of-ranges
        BODY-OF-LOOP...
        (setq top-of-ranges (cdr top-of-ranges)))
 
    Put together, the two loops look like this:
 
      (while top-of-ranges
 
        ;; Count the number of elements within the current range.
        (while LENGTH-ELEMENT-SMALLER-THAN-TOP-OF-RANGE
          (setq number-within-range (1+ number-within-range))
          (setq sorted-lengths (cdr sorted-lengths)))
 
        ;; Move to next range.
        (setq top-of-ranges (cdr top-of-ranges)))
 
    In addition, in each circuit of the outer loop, Emacs should record
 the number of definitions within that range (the value of
 ‘number-within-range’) in a list.  We can use ‘cons’ for this purpose.
 (See‘cons’ cons.)
 
    The ‘cons’ function works fine, except that the list it constructs
 will contain the number of definitions for the highest range at its
 beginning and the number of definitions for the lowest range at its end.
 This is because ‘cons’ attaches new elements of the list to the
 beginning of the list, and since the two loops are working their way
 through the lengths’ list from the lower end first, the
 ‘defuns-per-range-list’ will end up largest number first.  But we will
 want to print our graph with smallest values first and the larger later.
 The solution is to reverse the order of the ‘defuns-per-range-list’.  We
 can do this using the ‘nreverse’ function, which reverses the order of a
 list.
 
    For example,
 
      (nreverse '(1 2 3 4))
 
 produces:
 
      (4 3 2 1)
 
    Note that the ‘nreverse’ function is destructive—that is, it changes
 the list to which it is applied; this contrasts with the ‘car’ and ‘cdr’
 functions, which are non-destructive.  In this case, we do not want the
 original ‘defuns-per-range-list’, so it does not matter that it is
 destroyed.  (The ‘reverse’ function provides a reversed copy of a list,
 leaving the original list as is.)
 
    Put all together, the ‘defuns-per-range’ looks like this:
 
      (defun defuns-per-range (sorted-lengths top-of-ranges)
        "SORTED-LENGTHS defuns in each TOP-OF-RANGES range."
        (let ((top-of-range (car top-of-ranges))
              (number-within-range 0)
              defuns-per-range-list)
 
          ;; Outer loop.
          (while top-of-ranges
 
            ;; Inner loop.
            (while (and
                    ;; Need number for numeric test.
                    (car sorted-lengths)
                    (< (car sorted-lengths) top-of-range))
 
              ;; Count number of definitions within current range.
              (setq number-within-range (1+ number-within-range))
              (setq sorted-lengths (cdr sorted-lengths)))
 
            ;; Exit inner loop but remain within outer loop.
 
            (setq defuns-per-range-list
                  (cons number-within-range defuns-per-range-list))
            (setq number-within-range 0)      ; Reset count to zero.
 
            ;; Move to next range.
            (setq top-of-ranges (cdr top-of-ranges))
            ;; Specify next top of range value.
            (setq top-of-range (car top-of-ranges)))
 
          ;; Exit outer loop and count the number of defuns larger than
          ;;   the largest top-of-range value.
          (setq defuns-per-range-list
                (cons
                 (length sorted-lengths)
                 defuns-per-range-list))
 
          ;; Return a list of the number of definitions within each range,
          ;;   smallest to largest.
          (nreverse defuns-per-range-list)))
 
 The function is straightforward except for one subtle feature.  The
 true-or-false test of the inner loop looks like this:
 
      (and (car sorted-lengths)
           (< (car sorted-lengths) top-of-range))
 
 instead of like this:
 
      (< (car sorted-lengths) top-of-range)
 
    The purpose of the test is to determine whether the first item in the
 ‘sorted-lengths’ list is less than the value of the top of the range.
 
    The simple version of the test works fine unless the ‘sorted-lengths’
 list has a ‘nil’ value.  In that case, the ‘(car sorted-lengths)’
 expression function returns ‘nil’.  The ‘<’ function cannot compare a
 number to ‘nil’, which is an empty list, so Emacs signals an error and
 stops the function from attempting to continue to execute.
 
    The ‘sorted-lengths’ list always becomes ‘nil’ when the counter
 reaches the end of the list.  This means that any attempt to use the
 ‘defuns-per-range’ function with the simple version of the test will
 fail.
 
    We solve the problem by using the ‘(car sorted-lengths)’ expression
 in conjunction with the ‘and’ expression.  The ‘(car sorted-lengths)’
 expression returns a non-‘nil’ value so long as the list has at least
 one number within it, but returns ‘nil’ if the list is empty.  The ‘and’
 expression first evaluates the ‘(car sorted-lengths)’ expression, and if
 it is ‘nil’, returns false _without_ evaluating the ‘<’ expression.  But
 if the ‘(car sorted-lengths)’ expression returns a non-‘nil’ value, the
 ‘and’ expression evaluates the ‘<’ expression, and returns that value as
 the value of the ‘and’ expression.
 
    This way, we avoid an error.  (SeeThe ‘kill-new’ function
 kill-new function, for information about ‘and’.)
 
    Here is a short test of the ‘defuns-per-range’ function.  First,
 evaluate the expression that binds (a shortened) ‘top-of-ranges’ list to
 the list of values, then evaluate the expression for binding the
 ‘sorted-lengths’ list, and then evaluate the ‘defuns-per-range’
 function.
 
      ;; (Shorter list than we will use later.)
      (setq top-of-ranges
       '(110 120 130 140 150
         160 170 180 190 200))
 
      (setq sorted-lengths
            '(85 86 110 116 122 129 154 176 179 200 265 300 300))
 
      (defuns-per-range sorted-lengths top-of-ranges)
 
 The list returned looks like this:
 
      (2 2 2 0 0 1 0 2 0 0 4)
 
 Indeed, there are two elements of the ‘sorted-lengths’ list smaller than
 110, two elements between 110 and 119, two elements between 120 and 129,
 and so on.  There are four elements with a value of 200 or larger.