On covariance matrix

Last Update:2018-08-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

first, the basic concept of statistics

The most basic concept in statistics is the mean, variance and standard deviation of the sample. First, we give a set of n samples, and the formula for these concepts is described below:

Mean value:

Standard deviation:

Variance:

The mean value describes the middle point of the sample set, which tells us that the information is finite, and the standard deviation describes the average of the distance from each sample point of the sample set to the mean value.

Take these two sets for example, [0, 8, 12, 20] and [8, 9, 11, 12], two sets of the mean is 10, but obviously two sets of the difference is very large, calculate the standard deviation, the former is 8.3, the latter is 1.8, it is obvious that the latter is more concentrated, so its standard deviation is described in this "Dispersion degree". This is divided by n-1 rather than n because it allows us to better approximate the overall standard deviation with a smaller set of samples, known as the "unbiased estimate" statistically. And the variance is just the square of the standard deviation.

Second, why the need for covariance

Standard deviation and variance are generally used to describe one-dimensional data, but in real life we often encounter data sets containing multidimensional data, the simplest is that everyone in school will inevitably have to count a number of subjects in the exam results. In the face of such a dataset, we can of course compute its variance independently of each dimension, but usually we want to know more about the extent to which a boy's degree of indecency is related to the popularity of a girl. Covariance is such a statistic used to measure the relationship of two random variables that we can emulate the definition of variance:

To measure the degree to which each dimension deviates from its mean, the covariance can be defined as:

What is the point of covariance results? If the result is positive, the two are positively correlated (the definition of "correlation coefficient" can be drawn from covariance), which means that the more wretched a person is, the more popular The girl is. If the result is negative, it means that the two are negatively correlated, the more wretched the girl the more annoying. If 0, then there is no relationship between the two, the wretched not wretched and girls like there is no association between, is the statistical said "mutual independence."

We can also see some obvious properties from the definition of covariance, such as:

third, covariance matrix

The wretched and popular questions mentioned above are typical two-dimensional problems, and covariance can only deal with two-dimensional problems, the number of dimensions of the natural need to calculate multiple covariance, such as n-dimensional data sets need to compute covariance, that naturally we think of using matrices to organize the data. The definition of covariance matrix is given:

This definition is very easy to understand, we can give a three-dimensional example, assuming that the dataset has three dimensions, the covariance matrix is:

Thus, the covariance matrix is a symmetric matrix, and the diagonal is the variance of each dimension.

four, Matlab covariance actual combat

It is important to be clear that the covariance matrix calculates the covariance between different dimensions, not between the different samples. The following demo will use MATLAB, in order to illustrate the calculation principle, not directly invoke the MATLAB cov function:

First, a 10*3-dimensional integer matrix is randomly generated as a sample set, 10 is the number of samples, and 3 is the dimension of the sample.

Figure 1 using MATLAB to generate a sample set

According to the formula, the calculation of covariance needs to calculate the mean value, in particular, the covariance matrix is to calculate the covariance between different dimensions, always keep this in mind. Each row of the sample matrix is a sample, each column is a dimension, so we have to calculate the mean value by column. To describe the convenience, we first assign the data of three dimensions to each:

Figure 2 assigns the data of three dimensions to each value

Calculates the covariance of dim1 and dim2,dim1 with DIM3,DIM2 and dim3:

Figure 3 calculates three covariance

The elements on the diagonal of the covariance matrix are the variance of each dimension, and we calculate these variances in turn:

Figure 4 calculates the variance on the diagonal

In this way, we get all the data needed to compute the covariance matrix, and can call the cov function of Matlab to get the covariance matrix directly:

Fig. 5 The covariance matrix of the sample is directly computed using the COV function of MATLAB

The result of the calculation is exactly the same as when the previous data has been filled in with the matrix.

v. Summary

The key to understanding the covariance matrix is to keep in mind that its calculations are covariance between different dimensions rather than between different samples. To get a sample matrix, the first thing to be clear is whether a row is a sample or a dimension, the mind clear the entire calculation process will be downstream, so it will not be confused.

Original address:

http://pinkyjie.com/2010/08/31/covariance/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

On covariance matrix

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

On covariance matrix

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support