Covariance and correlation coefficients

Source: Internet
Author: User

Covariance and correlation coefficients

Covariance

The two-dimensional random variable (x, y) and the covariance between X and Y are defined as:

Cov (x, y) =e{[x-e (×)][y-e (Y)]}

Where: E (x) is the expectation of component X, E (y) is the expectation of component Y

Covariance cov (x, y) is a characteristic number that describes the degree of correlation between random variables. As can be seen from the definition of covariance, it is the mathematical expectation of the product of the deviation "x-e (x)" and the Y deviation "y-e (y)" of X. The covariance can also be positive because the deviation can be positively negative.

L when covariance cov (x, y) >0, it is said that X is positively correlated with Y

L when covariance cov (x, y) <0, it is said that X is negatively correlated with Y

L when covariance cov (x, y) =0, it is said that X is not related to Y

As an example,

Two-dimensional random variables (height x, weight y) (data is self-compiled)

Height x (cm)

Weight y (500g)

X-E (X)

Y-E (Y)

[X-e (X)] [Y-e (Y)]

1

152

92

-19.4

-39.7

770.18

2

185

162

13.6

30.3

412.08

3

169

125

-2.4

-6.7

16.08

4

172

118

0.6

-13.7

-8.22

5

174

122

2.6

-9.7

-25.22

6

168

135

-3.4

3.3

-11.22

7

180

168

8.6

36.3

312.18

E (X) =171.4

E (Y) =131.7

E{[X-E (X)][y-e (Y)]}=209.4

According to intuition, we would also think that height and weight are positively correlated, taller body weight is generally larger, the same weight of the height of the general is also relatively high. The calculated results are also very much in line with our intuition.

To give a counter-example

Two-dimensional random variables (time to play X, learning score y) (data is compiled by yourself)

Game time X (h/days)

Learning Achievement Y

X-E (X)

Y-E (Y)

[X-e (X)] [Y-e (Y)]

1

0

95

-1.36

20.7

-28.152

2

1

65

-0.36

-9.3

3.348

3

3

70

1.64

-4.3

-7.052

4

2

55

0.64

-19.3

-12.352

5

2.5

65

1.14

-9.3

-10.602

6

0.5

80

-0.86

5.7

-4.902

7

0.5

90

-0.86

15.7

-13.502

E (X) =1.36

E (Y) =74.3

E{[X-E (X)][y-e (Y)]}=-10.5

Also according to the intuition we will feel that the longer the children play games, the more likely the academic performance is worse, the calculation results are also very good in line with our intuition.

From the above two scatter plots, we can see the trend of the change of body weight, and the trend of learning performance with the duration of playing games. Therefore, it can be said that the covariance is a measure of two random variables with the same trend of change.

However, covariance can only be qualitative analysis, and can not be quantitative analysis, such as the height and weight of the covariance between 209.1, the correlation between the specific how much, covariance does not give quantitative criteria for judging. So we elicit the concept of correlation coefficients.

Correlation coefficient

Definition of correlation coefficients

Where: Var (x) is the variance of x, and Var (y) is the variance of Y.

The -1corr (x, y) 1 can be obtained according to the Schwartz inequality, so that the correlation of two random variables can be analyzed quantitatively.

L Corr (x, y) =1, indicating that the two random variables are fully positive, i.e. satisfying y=ax+b,a>0

Considering Corr (X,X), the two random variables are the same, and certainly satisfy the linear relationship, at this point, Cov (x,x) =var (X), easy to get the Corr (x, y) =1

L Corr (x, y) =-1, it shows that two random variables are completely negatively correlated, which satisfies the y=-ax+b,a>0

L 0<| Corr (x, y) |<1, it shows that the two random variables have a certain degree of linear relationship.

For example, in the previous two examples,

Height and Weight: Corr (x, y) = 209.4/(10.2*24.4) =0.84

Game time and Learning score: Corr (x, y) = -10.5/(1.1*13.4) = 0.71

With the correlation coefficient, we can say that the linear correlation between height and weight is greater than the linear correlation between game time and learning performance.

Additional notes:

Corr (x, y) is 0, which means X is not related to Y, and irrelevant here refers to the absence of a linear relationship between x and Y, but it is not. So it might be more appropriate to interpret "correlation" as "linear correlation".

Covariance and correlation coefficients

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.