calc: Single-Variable Statistics

 
 10.7.1 Single-Variable Statistics
 ---------------------------------
 
 These functions do various statistical computations on single vectors.
 Given a numeric prefix argument, they actually pop N objects from the
 stack and combine them into a data vector.  Each object may be either a
 number or a vector; if a vector, any sub-vectors inside it are
 “flattened” as if by ‘v a 0’; SeeManipulating Vectors.  By default
 one object is popped, which (in order to be useful) is usually a vector.
 
    If an argument is a variable name, and the value stored in that
 variable is a vector, then the stored vector is used.  This method has
 the advantage that if your data vector is large, you can avoid the slow
 process of manipulating it directly on the stack.
 
    These functions are left in symbolic form if any of their arguments
 are not numbers or vectors, e.g., if an argument is a formula, or a
 non-vector variable.  However, formulas embedded within vector arguments
 are accepted; the result is a symbolic representation of the
 computation, based on the assumption that the formula does not itself
 represent a vector.  All varieties of numbers such as error forms and
 interval forms are acceptable.
 
    Some of the functions in this section also accept a single error form
 or interval as an argument.  They then describe a property of the normal
 or uniform (respectively) statistical distribution described by the
 argument.  The arguments are interpreted in the same way as the M
 argument of the random number function ‘k r’.  In particular, an
 interval with integer limits is considered an integer distribution, so
 that ‘[2 .. 6)’ is the same as ‘[2 .. 5]’.  An interval with at least
 one floating-point limit is a continuous distribution: ‘[2.0 .. 6.0)’ is
 _not_ the same as ‘[2.0 .. 5.0]’!
 
    The ‘u #’ (‘calc-vector-count’) [‘vcount’] command computes the
 number of data values represented by the inputs.  For example,
 ‘vcount(1, [2, 3], [[4, 5], [], x, y])’ returns 7.  If the argument is a
 single vector with no sub-vectors, this simply computes the length of
 the vector.
 
    The ‘u +’ (‘calc-vector-sum’) [‘vsum’] command computes the sum of
 the data values.  The ‘u *’ (‘calc-vector-prod’) [‘vprod’] command
 computes the product of the data values.  If the input is a single flat
 vector, these are the same as ‘V R +’ and ‘V R *’ (SeeReducing and
 Mapping).
 
    The ‘u X’ (‘calc-vector-max’) [‘vmax’] command computes the maximum
 of the data values, and the ‘u N’ (‘calc-vector-min’) [‘vmin’] command
 computes the minimum.  If the argument is an interval, this finds the
 minimum or maximum value in the interval.  (Note that ‘vmax([2..6)) = 5’
 as described above.)  If the argument is an error form, this returns
 plus or minus infinity.
 
    The ‘u M’ (‘calc-vector-mean’) [‘vmean’] command computes the average
 (arithmetic mean) of the data values.  If the inputs are error forms ‘x
 +/- s’, this is the weighted mean of the ‘x’ values with weights ‘1 /
 s^2’.  If the inputs are not error forms, this is simply the sum of the
 values divided by the count of the values.
 
    Note that a plain number can be considered an error form with error
 ‘s = 0’.  If the input to ‘u M’ is a mixture of plain numbers and error
 forms, the result is the mean of the plain numbers, ignoring all values
 with non-zero errors.  (By the above definitions it’s clear that a plain
 number effectively has an infinite weight, next to which an error form
 with a finite weight is completely negligible.)
 
    This function also works for distributions (error forms or
 intervals).  The mean of an error form ‘A +/- B’ is simply ‘a’.  The
 mean of an interval is the mean of the minimum and maximum values of the
 interval.
 
    The ‘I u M’ (‘calc-vector-mean-error’) [‘vmeane’] command computes
 the mean of the data points expressed as an error form.  This includes
 the estimated error associated with the mean.  If the inputs are error
 forms, the error is the square root of the reciprocal of the sum of the
 reciprocals of the squares of the input errors.  (I.e., the variance is
 the reciprocal of the sum of the reciprocals of the variances.)  If the
 inputs are plain numbers, the error is equal to the standard deviation
 of the values divided by the square root of the number of values.  (This
 works out to be equivalent to calculating the standard deviation and
 then assuming each value’s error is equal to this standard deviation.)
 
    The ‘H u M’ (‘calc-vector-median’) [‘vmedian’] command computes the
 median of the data values.  The values are first sorted into numerical
 order; the median is the middle value after sorting.  (If the number of
 data values is even, the median is taken to be the average of the two
 middle values.)  The median function is different from the other
 functions in this section in that the arguments must all be real
 numbers; variables are not accepted even when nested inside vectors.
 (Otherwise it is not possible to sort the data values.)  If any of the
 input values are error forms, their error parts are ignored.
 
    The median function also accepts distributions.  For both normal
 (error form) and uniform (interval) distributions, the median is the
 same as the mean.
 
    The ‘H I u M’ (‘calc-vector-harmonic-mean’) [‘vhmean’] command
 computes the harmonic mean of the data values.  This is defined as the
 reciprocal of the arithmetic mean of the reciprocals of the values.
 
    The ‘u G’ (‘calc-vector-geometric-mean’) [‘vgmean’] command computes
 the geometric mean of the data values.  This is the Nth root of the
 product of the values.  This is also equal to the ‘exp’ of the
 arithmetic mean of the logarithms of the data values.
 
    The ‘H u G’ [‘agmean’] command computes the “arithmetic-geometric
 mean” of two numbers taken from the stack.  This is computed by
 replacing the two numbers with their arithmetic mean and geometric mean,
 then repeating until the two values converge.
 
    The ‘u R’ (‘calc-vector-rms’) [‘rms’] command computes the RMS
 (root-mean-square) of the data values.  As its name suggests, this is
 the square root of the mean of the squares of the data values.
 
    The ‘u S’ (‘calc-vector-sdev’) [‘vsdev’] command computes the
 standard deviation of the data values.  If the values are error forms,
 the errors are used as weights just as for ‘u M’.  This is the _sample_
 standard deviation, whose value is the square root of the sum of the
 squares of the differences between the values and the mean of the ‘N’
 values, divided by ‘N-1’.
 
    This function also applies to distributions.  The standard deviation
 of a single error form is simply the error part.  The standard deviation
 of a continuous interval happens to equal the difference between the
 limits, divided by ‘sqrt(12)’.  The standard deviation of an integer
 interval is the same as the standard deviation of a vector of those
 integers.
 
    The ‘I u S’ (‘calc-vector-pop-sdev’) [‘vpsdev’] command computes the
 _population_ standard deviation.  It is defined by the same formula as
 above but dividing by ‘N’ instead of by ‘N-1’.  The population standard
 deviation is used when the input represents the entire set of data
 values in the distribution; the sample standard deviation is used when
 the input represents a sample of the set of all data values, so that the
 mean computed from the input is itself only an estimate of the true
 mean.
 
    For error forms and continuous intervals, ‘vpsdev’ works exactly like
 ‘vsdev’.  For integer intervals, it computes the population standard
 deviation of the equivalent vector of integers.
 
    The ‘H u S’ (‘calc-vector-variance’) [‘vvar’] and ‘H I u S’
 (‘calc-vector-pop-variance’) [‘vpvar’] commands compute the variance of
 the data values.  The variance is the square of the standard deviation,
 i.e., the sum of the squares of the deviations of the data values from
 the mean.  (This definition also applies when the argument is a
 distribution.)
 
    The ‘vflat’ algebraic function returns a vector of its arguments,
 interpreted in the same way as the other functions in this section.  For
 example, ‘vflat(1, [2, [3, 4]], 5)’ returns ‘[1, 2, 3, 4, 5]’.