Covariance Matrix, correlation coefficient matrix

Source: Internet
Author: User
 

Variable description:

Set to a group of random variables that constitute a random vector. , Each random variable has m samples, then there is a sample Matrix

(1)

Medium Corresponding to the sample vector of every random vector X, It corresponds to the vector consisting of all the sample values of the random single variable I.

Covariance between single random variables:

Random Variable The covariance between them can be expressed

(2)

Expected covariance values can be obtained based on known sample values, for example:

(3)

It can be further simplified:

(4)

Covariance Matrix:

(5)

Medium To obtain the covariance matrix expression.

Assuming that the mean of all samples is a zero vector, formula (5) can be achieved:

(6)

Note:

1. Each element in the covariance matrix represents the covariance between different components of the random vector X, rather than the covariance between different elements. For example, the element CIJ is the reflected random variable Xi, covariance of XJ.

2. covariance is the second-order statistical feature between variables. If the correlation between different components of a random vector is very small, the resulting covariance matrix is a diagonal matrix. For some special applications, in order to make the length of the random vector small, we can use the principal component analysis method to make the covariance matrix of the transformed variables a diagonal matrix, then we can discard some smaller components of energy (the elements on the diagonal lines reflect the variance, that is, the exchange energy ). Especially in the field of pattern recognition, when the dimension of pattern vectors is too high, it will affect the generalization performance of the recognition system.

3. Note that the formula (5) and formula (6) are obtained here) it only gives an estimate of the real value of the random vector covariance matrix (represented by the value of the sampled sample, which varies with the value of the sample ), therefore, the covariance matrix is dependent on the number of samples. The larger the number of samples, the wider the coverage of the samples in the whole, the more reliable the covariance matrix is.

4. Just like the relationship between covariance and correlation coefficient, we sometimes introduce a matrix of correlation coefficients to learn more intuitively how much correlation is between different components of a random vector.


In probability theory and statistics,RelatedOrCorrelation CoefficientOrCorrelation CoefficientShows the intensity and direction of the linear relationship between two random variables. In statistics, the correlation is used to measure the distance between two variables and each other. In this broad definition, many data-related coefficients are defined based on data characteristics.

Different coefficients can be used for different data features. Pearson product difference coefficient is the most commonly used. It is defined as the covariance of two variables divided by the standard deviation (variance) of the two variables ).

Pearson product Difference Coefficient
Mathematical features

,EIt is a mathematical expectation, and COV represents the covariance.

Because μX=E(X), σX2 =E(X2 )?E2 (X), In the same place,Y, Can be written

When neither of the two variables has a standard deviation of zero, the correlation coefficient is defined. According to the Gaussian-Schwarz inequality, the correlation coefficient does not exceed 1. When the linear relationship between the two variables is enhanced, the correlation coefficient tends to 1 or-1. When a variable is added and another variable is added, the correlation coefficient is greater than 0. When a variable is added and a variable is reduced, the correlation coefficient is smaller than 0. When two variables are independent, the correlation coefficient is 0, but the opposite is not true. This is because the correlation coefficient only reflects the linear correlation between two variables. For example,XIs a uniformly distributed random variable on the interval.Y=X2.YIs totally causedXOK. ThereforeYAndXIs not independent. However, the correlation coefficient is 0. Or they are irrelevant. WhenYAndXWhen they are subject to the joint normal distribution, they are independent and unrelated.

When one or two variables have a scalar error, their correlation is weakened. In this case, the "anti-attenuation" is a more accurate coefficient.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.