octave: Correlation and Regression Analysis

 
 26.4 Correlation and Regression Analysis
 ========================================
 
  -- : cov (X)
  -- : cov (X, OPT)
  -- : cov (X, Y)
  -- : cov (X, Y, OPT)
      Compute the covariance matrix.
 
      If each row of X and Y is an observation, and each column is a
      variable, then the (I, J)-th entry of ‘cov (X, Y)’ is the
      covariance between the I-th variable in X and the J-th variable in
      Y.
 
           cov (X) = 1/(N-1) * SUM_i (X(i) - mean(X)) * (Y(i) - mean(Y))
 
      where N is the length of the X and Y vectors.
 
      If called with one argument, compute ‘cov (X, X)’, the covariance
      between the columns of X.
 
      The argument OPT determines the type of normalization to use.
      Valid values are
 
      0:
           normalize with N-1, provides the best unbiased estimator of
           the covariance [default]
 
      1:
           normalize with N, this provides the second moment around the
           mean
 
      Compatibility Note:: Octave always treats rows of X and Y as
      multivariate random variables.  For two inputs, however, MATLAB
      treats X and Y as two univariate distributions regardless of their
      shapes, and will calculate ‘cov ([X(:), Y(:)])’ whenever the number
      of elements in X and Y are equal.  This will result in a 2x2
      matrix.  Code relying on MATLAB’s definition will need to be
      changed when running in Octave.
 
      See also: Seecorr XREFcorr.
 
  -- : corr (X)
  -- : corr (X, Y)
      Compute matrix of correlation coefficients.
 
      If each row of X and Y is an observation and each column is a
      variable, then the (I, J)-th entry of ‘corr (X, Y)’ is the
      correlation between the I-th variable in X and the J-th variable in
      Y.
 
           corr (X,Y) = cov (X,Y) / (std (X) * std (Y))
 
      If called with one argument, compute ‘corr (X, X)’, the correlation
      between the columns of X.
 
      See also: Seecov XREFcov.
 
  -- : spearman (X)
  -- : spearman (X, Y)
      Compute Spearman’s rank correlation coefficient RHO.
 
      For two data vectors X and Y, Spearman’s RHO is the correlation
      coefficient of the ranks of X and Y.
 
      If X and Y are drawn from independent distributions, RHO has zero
      mean and variance ‘1 / (N - 1)’, where N is the length of the X and
      Y vectors, and is asymptotically normally distributed.
 
      ‘spearman (X)’ is equivalent to ‘spearman (X, X)’.
 
      See also: Seeranks XREFranks, Seekendall XREFkendall.
 
  -- : kendall (X)
  -- : kendall (X, Y)
      Compute Kendall’s TAU.
 
      For two data vectors X, Y of common length N, Kendall’s TAU is the
      correlation of the signs of all rank differences of X and Y; i.e.,
      if both X and Y have distinct entries, then
 
                    1
           TAU = -------   SUM sign (Q(i) - Q(j)) * sign (R(i) - R(j))
                 N (N-1)   i,j
 
      in which the Q(i) and R(i) are the ranks of X and Y, respectively.
 
      If X and Y are drawn from independent distributions, Kendall’s TAU
      is asymptotically normal with mean 0 and variance ‘(2 * (2N+5)) /
      (9 * N * (N-1))’.
 
      ‘kendall (X)’ is equivalent to ‘kendall (X, X)’.
 
      See also: Seeranks XREFranks, Seespearman XREFspearman.
 
  -- : [THETA, BETA, DEV, DL, D2L, P] = logistic_regression (Y, X, PRINT,
           THETA, BETA)
      Perform ordinal logistic regression.
 
      Suppose Y takes values in K ordered categories, and let ‘gamma_i
      (X)’ be the cumulative probability that Y falls in one of the first
      I categories given the covariate X.  Then
 
           [theta, beta] = logistic_regression (y, x)
 
      fits the model
 
           logit (gamma_i (x)) = theta_i - beta' * x,   i = 1 ... k-1
 
      The number of ordinal categories, K, is taken to be the number of
      distinct values of ‘round (Y)’.  If K equals 2, Y is binary and the
      model is ordinary logistic regression.  The matrix X is assumed to
      have full column rank.
 
      Given Y only, ‘theta = logistic_regression (y)’ fits the model with
      baseline logit odds only.
 
      The full form is
 
           [theta, beta, dev, dl, d2l, gamma]
              = logistic_regression (y, x, print, theta, beta)
 
      in which all output arguments and all input arguments except Y are
      optional.
 
      Setting PRINT to 1 requests summary information about the fitted
      model to be displayed.  Setting PRINT to 2 requests information
      about convergence at each iteration.  Other values request no
      information to be displayed.  The input arguments THETA and BETA
      give initial estimates for THETA and BETA.
 
      The returned value DEV holds minus twice the log-likelihood.
 
      The returned values DL and D2L are the vector of first and the
      matrix of second derivatives of the log-likelihood with respect to
      THETA and BETA.
 
      P holds estimates for the conditional distribution of Y given X.