Covariance and correlation coefficients
Covariance
The two-dimensional random variable (x, y) and the covariance between X and Y are defined as:
Cov (x, y) =e{[x-e (×)][y-e (Y)]}
Where: E (x) is the expectation of component X, E (y) is the expectation of component Y
Covariance cov (x, y) is a characteristic number that describes the degree of correlation between random variables. As can be seen from the definition of covariance, it is the mathematical expectation of the product of the deviation "x-e (x)" and the Y deviation "y-e (y)" of X. The covariance can also be positive because the deviation can be positively negative.
L when covariance cov (x, y) >0, it is said that X is positively correlated with Y
L when covariance cov (x, y) <0, it is said that X is negatively correlated with Y
L when covariance cov (x, y) =0, it is said that X is not related to Y
As an example,
Two-dimensional random variables (height x, weight y) (data is self-compiled)
|
Height x (cm) |
Weight y (500g) |
X-E (X) |
Y-E (Y) |
[X-e (X)] [Y-e (Y)] |
1 |
152 |
92 |
-19.4 |
-39.7 |
770.18 |
2 |
185 |
162 |
13.6 |
30.3 |
412.08 |
3 |
169 |
125 |
-2.4 |
-6.7 |
16.08 |
4 |
172 |
118 |
0.6 |
-13.7 |
-8.22 |
5 |
174 |
122 |
2.6 |
-9.7 |
-25.22 |
6 |
168 |
135 |
-3.4 |
3.3 |
-11.22 |
7 |
180 |
168 |
8.6 |
36.3 |
312.18 |
|
E (X) =171.4 |
E (Y) =131.7 |
|
|
E{[X-E (X)][y-e (Y)]}=209.4 |
According to intuition, we would also think that height and weight are positively correlated, taller body weight is generally larger, the same weight of the height of the general is also relatively high. The calculated results are also very much in line with our intuition.
To give a counter-example
Two-dimensional random variables (time to play X, learning score y) (data is compiled by yourself)
|
Game time X (h/days) |
Learning Achievement Y |
X-E (X) |
Y-E (Y) |
[X-e (X)] [Y-e (Y)] |
1 |
0 |
95 |
-1.36 |
20.7 |
-28.152 |
2 |
1 |
65 |
-0.36 |
-9.3 |
3.348 |
3 |
3 |
70 |
1.64 |
-4.3 |
-7.052 |
4 |
2 |
55 |
0.64 |
-19.3 |
-12.352 |
5 |
2.5 |
65 |
1.14 |
-9.3 |
-10.602 |
6 |
0.5 |
80 |
-0.86 |
5.7 |
-4.902 |
7 |
0.5 |
90 |
-0.86 |
15.7 |
-13.502 |
|
E (X) =1.36 |
E (Y) =74.3 |
|
|
E{[X-E (X)][y-e (Y)]}=-10.5 |
Also according to the intuition we will feel that the longer the children play games, the more likely the academic performance is worse, the calculation results are also very good in line with our intuition.
From the above two scatter plots, we can see the trend of the change of body weight, and the trend of learning performance with the duration of playing games. Therefore, it can be said that the covariance is a measure of two random variables with the same trend of change.
However, covariance can only be qualitative analysis, and can not be quantitative analysis, such as the height and weight of the covariance between 209.1, the correlation between the specific how much, covariance does not give quantitative criteria for judging. So we elicit the concept of correlation coefficients.
Correlation coefficient
Definition of correlation coefficients
Where: Var (x) is the variance of x, and Var (y) is the variance of Y.
The -1corr (x, y) 1 can be obtained according to the Schwartz inequality, so that the correlation of two random variables can be analyzed quantitatively.
L Corr (x, y) =1, indicating that the two random variables are fully positive, i.e. satisfying y=ax+b,a>0
Considering Corr (X,X), the two random variables are the same, and certainly satisfy the linear relationship, at this point, Cov (x,x) =var (X), easy to get the Corr (x, y) =1
L Corr (x, y) =-1, it shows that two random variables are completely negatively correlated, which satisfies the y=-ax+b,a>0
L 0<| Corr (x, y) |<1, it shows that the two random variables have a certain degree of linear relationship.
For example, in the previous two examples,
Height and Weight: Corr (x, y) = 209.4/(10.2*24.4) =0.84
Game time and Learning score: Corr (x, y) = -10.5/(1.1*13.4) = 0.71
With the correlation coefficient, we can say that the linear correlation between height and weight is greater than the linear correlation between game time and learning performance.
Additional notes:
Corr (x, y) is 0, which means X is not related to Y, and irrelevant here refers to the absence of a linear relationship between x and Y, but it is not. So it might be more appropriate to interpret "correlation" as "linear correlation".
Covariance and correlation coefficients