Covariance of meanings and formulas
Probabilistic statistical studies children know that statistics in the most important concepts are sample averages, variances, or with standard deviations together. First of all, we will give you a sample containing n, narrative, these high school math children should know it, around the belt.
Mean value:
Standard deviation:
Variance:
It is clear that the mean description is the middle point of the sample set, which tells us that the information is very limited.
The standard deviation gives us a description of the average distance from each sample point of the sample set to the mean. Take these two sets as an example. [0,8,12,20] and [8,9,11,12], the mean value of two sets is 10, but obviously two sets the difference is very large, calculate the standard deviation of the two, the former is 8.3, the latter is 1.8. Obviously the latter is more concentrated, so its standard deviation is smaller, the standard deviation description is such a "dispersion degree."
The reason for dividing by n-1 rather than dividing by N is that it allows us to better approximate the overall standard deviation with a smaller set of samples. That is, statistically so-called "unbiased projections ".
And the variance is the square of the standard deviation.
Why is covariance required?
The above statistics seem to have almost identical descriptions. However, we should note that the standard deviation and variance are usually used to describe one-dimensional data, but in real life we often encounter data sets containing multidimensional data, the simplest of you to go to school to count the test scores of multiple disciplines. In the face of this data set, of course we can calculate its variance according to each dimension independently, but usually we also want to know a lot of other, for example, a boy's wretched degree and his popularity with girls there are some links ah, hehe ~- covariance is a way to measure Statistics of two random variable relationships . We can imitate the definition of variance:
To measure the degree to which each dimension deviates from its mean, the standard deviation can be defined as follows:
What is the significance of the results of the covariance? Assuming the result is positive, it means that the two are positively correlated (the definition of "correlation coefficient" can be derived from covariance), that is to say, the more wretched a person is, the more popular The girl is, hehe. That must be ~ negative results indicate negative correlation, the more wretched girls the more annoying, maybe? assumed to be 0. It is also the "mutual independence" which is statistically said.
From the definition of covariance we can also see some obvious properties, such as:
Covariance is more than the covariance matrix
The wretched and popular problem mentioned in the previous section is a typical two-dimensional problem, and covariance can only deal with two-dimensional problems , that is, the number of dimensions is naturally required to calculate multiple covariance, for example, n-dimensional data sets need to calculate
A covariance. Naturally, we would think of using matrices to organize this data . Give the definition of the covariance matrix:
This definition is still very easy to understand, we can give a simple three-dimensional example, if the dataset has three dimensions, then the covariance matrix is
As can be seen, the covariance matrix is a symmetric matrix, and the diagonal is the variance on each dimension .
matlab covariance combat covariance matrix calculates the covariance between the different dimensions. And not the same between the different. This I will combine the following sample description, the following demonstration will use MATLAB, in order to illustrate the principle of calculation, do not directly invoke the MATLAB cov function (blue part of MATLAB code).
1 |
mysample = Fix (rand (10,3) *50) |
According to the formula, it is necessary to calculate the mean, that is, the mean or column by row, I always bothered with this problem. We also highlighted the above. The covariance matrix is to calculate the covariance between the different dimensions, and keep this in mind at all times. Each row of the sample matrix is a sample, each column is a dimension, so we want to calculate the mean by column . It is convenient to describe the narrative. We first assign values for three dimensions: |
23 |
dim1 = Mysample (:, 1);d I M2 = Mysample (:, 2);d im3 = Mysample (:, 3); |
Calculate the covariance of dim1 and dim2,dim1 with DIM3,DIM2 and dim3:
123 |
Sum ((Dim1-mean (DIM1)). * (Dim2-mean (DIM2)))/(Size (mysample,1)-1)% get 74.5333sum ((Dim1-mean (DIM1)). * (Dim3-mean (d) IM3))/(Size (mysample,1)-1)% get -10.0889sum ((Dim2-mean (DIM2))/(Dim3-mean (DIM3))/(Size (mysample,1)-1)% get -106.4000 |
It's much more easy to figure this out. The diagonal of the covariance matrix is the variance on each dimension, which we calculate in turn:
123 |
STD (DIM1) ^2% get 108.3222std (dim2) ^2% get 260.6222std (dim3) ^2% get 94.1778 |
In this way, we get all the data needed to compute the covariance matrix and invoke the COV function of MATLAB to verify:
Does the data we calculate be the same?
Update: It was suddenly discovered today that the covariance matrix could be computed in such a way that the sample matrix is centered first. That is, each dimension subtracts the mean of the dimension, so that the mean value on each dimension is 0, and then the new sample matrix is directly multiplied by its transpose. Then divide (N-1) to be able. In fact, such a method is also from the previous formula channel, just understand is not very intuitive, but in the abstract formula derivation is not often used!
The same gives the MATLAB code implementation:
12 |
X = Mysample-repmat (mean (mysample), 10, 1); The%-centric sample matrix makes each dimension mean 0C = (X ' *x)./(Size (x,1)-1); |
Summarize
the key to understanding the covariance matrix is to remember that it calculates the covariance between the different dimensions , rather than the difference between the two, to get a sample matrix. The first thing we need to understand is whether a row is a sample or a dimension . The mind understands that the whole calculation process will go down the river. So you will not be confused ~
Covariance and correlation coefficients--ppt sample http://download.csdn.net/detail/goodshot/5087550
Covariance of meanings and formulas