octave: Basic Statistical Functions

 
 26.2 Basic Statistical Functions
 ================================
 
 Octave supports various helpful statistical functions.  Many are useful
 as initial steps to prepare a data set for further analysis.  Others
 provide different measures from those of the basic descriptive
 statistics.
 
  -- : center (X)
  -- : center (X, DIM)
      Center data by subtracting its mean.
 
      If X is a vector, subtract its mean.
 
      If X is a matrix, do the above for each column.
 
      If the optional argument DIM is given, operate along this
      dimension.
 
      Programming Note: ‘center’ has obvious application for normalizing
      statistical data.  It is also useful for improving the precision of
      general numerical calculations.  Whenever there is a large value
      that is common to a batch of data, the mean can be subtracted off,
      the calculation performed, and then the mean added back to obtain
      the final answer.
 
      See also: Seezscore XREFzscore.
 
  -- : Z = zscore (X)
  -- : Z = zscore (X, OPT)
  -- : Z = zscore (X, OPT, DIM)
  -- : [Z, MU, SIGMA] = zscore (...)
      Compute the Z score of X
 
      If X is a vector, subtract its mean and divide by its standard
      deviation.  If the standard deviation is zero, divide by 1 instead.
 
      The optional parameter OPT determines the normalization to use when
      computing the standard deviation and has the same definition as the
      corresponding parameter for ‘std’.
 
      If X is a matrix, calculate along the first non-singleton
      dimension.  If the third optional argument DIM is given, operate
      along this dimension.
 
      The optional outputs MU and SIGMA contain the mean and standard
      deviation.
 
DONTPRINTYET       See also: Seemean XREFmean, Seestd XREFstd, *notecenter:
DONTPRINTYET       See also: Seemean XREFmean, Seestd XREFstd, Seecenter

      XREFcenter.
 
  -- : N = histc (X, EDGES)
  -- : N = histc (X, EDGES, DIM)
  -- : [N, IDX] = histc (...)
      Compute histogram counts.
 
      When X is a vector, the function counts the number of elements of X
      that fall in the histogram bins defined by EDGES.  This must be a
      vector of monotonically increasing values that define the edges of
      the histogram bins.  ‘N(k)’ contains the number of elements in X
      for which ‘EDGES(k) <= X < EDGES(k+1)’.  The final element of N
      contains the number of elements of X exactly equal to the last
      element of EDGES.
 
      When X is an N-dimensional array, the computation is carried out
      along dimension DIM.  If not specified DIM defaults to the first
      non-singleton dimension.
 
      When a second output argument is requested an index matrix is also
      returned.  The IDX matrix has the same size as X.  Each element of
      IDX contains the index of the histogram bin in which the
      corresponding element of X was counted.
 
      See also: Seehist XREFhist.
 
 ‘unique’ function documented at Seeunique XREFunique. is often
 useful for statistics.
 
  -- : C = nchoosek (N, K)
  -- : C = nchoosek (SET, K)
 
      Compute the binomial coefficient of N or list all possible
      combinations of a SET of items.
 
      If N is a scalar then calculate the binomial coefficient of N and K
      which is defined as
 
            /   \
            | n |    n (n-1) (n-2) ... (n-k+1)       n!
            |   |  = ------------------------- =  ---------
            | k |               k!                k! (n-k)!
            \   /
 
      This is the number of combinations of N items taken in groups of
      size K.
 
      If the first argument is a vector, SET, then generate all
      combinations of the elements of SET, taken K at a time, with one
      row per combination.  The result C has K columns and
      ‘nchoosek (length (SET), K)’ rows.
 
      For example:
 
      How many ways can three items be grouped into pairs?
 
           nchoosek (3, 2)
              ⇒ 3
 
      What are the possible pairs?
 
           nchoosek (1:3, 2)
              ⇒  1   2
                  1   3
                  2   3
 
      Programming Note: When calculating the binomial coefficient
      ‘nchoosek’ works only for non-negative, integer arguments.  Use
      ‘bincoeff’ for non-integer and negative scalar arguments, or for
      computing many binomial coefficients at once with vector inputs for
      N or K.
 
      See also: Seebincoeff XREFbincoeff, Seeperms XREFperms.
 
  -- : perms (V)
      Generate all permutations of V with one row per permutation.
 
      The result has size ‘factorial (N) * N’, where N is the length of
      V.
 
      Example
 
           perms ([1, 2, 3])
           ⇒
             1   2   3
             2   1   3
             1   3   2
             2   3   1
             3   1   2
             3   2   1
 
      Programming Note: The maximum length of V should be less than or
      equal to 10 to limit memory consumption.
 
      See also: Seepermute XREFpermute, Seerandperm XREFrandperm,
      Seenchoosek XREFnchoosek.
 
  -- : ranks (X, DIM)
      Return the ranks of X along the first non-singleton dimension
      adjusted for ties.
 
      If the optional argument DIM is given, operate along this
      dimension.
 
      See also: Seespearman XREFspearman, Seekendall XREFkendall.
 
  -- : run_count (X, N)
  -- : run_count (X, N, DIM)
      Count the upward runs along the first non-singleton dimension of X
      of length 1, 2, ..., N-1 and greater than or equal to N.
 
      If the optional argument DIM is given then operate along this
      dimension.
 
      See also: Seerunlength XREFrunlength.
 
  -- : count = runlength (X)
  -- : [count, value] = runlength (X)
      Find the lengths of all sequences of common values.
 
      COUNT is a vector with the lengths of each repeated value.
 
      The optional output VALUE contains the value that was repeated in
      the sequence.
 
           runlength ([2, 2, 0, 4, 4, 4, 0, 1, 1, 1, 1])
           ⇒  [2, 1, 3, 1, 4]
 
      See also: Seerun_count XREFrun_count.
 
  -- : probit (P)
      Return the probit (the quantile of the standard normal
      distribution) for each element of P.
 
      See also: Seelogit XREFlogit.
 
  -- : logit (P)
      Compute the logit for each value of P
 
      The logit is defined as
 
           logit (P) = log (P / (1-P))
 
DONTPRINTYET       See also: Seeprobit XREFprobit, *notelogistic_cdf:
DONTPRINTYET       See also: Seeprobit XREFprobit, Seelogistic_cdf

      XREFlogistic_cdf.
 
  -- : cloglog (X)
      Return the complementary log-log function of X.
 
      The complementary log-log function is defined as
 
           cloglog (x) = - log (- log (X))
 
  -- : [T, L_X] = table (X)
  -- : [T, L_X, L_Y] = table (X, Y)
      Create a contingency table T from data vectors.
 
      The L_X and L_Y vectors are the corresponding levels.
 
      Currently, only 1- and 2-dimensional tables are supported.