Spearman rank correlation coefficient and Pearson Pearson correlation coefficient

Source: Internet
Author: User

1. Pearson Pearson correlation coefficient

Pearson's correlation coefficient is also known as Pearson's correlation coefficient, which is used to reflect the statistical similarity between the two variables. Or to represent the similarity of two vectors.

Pearson's correlation coefficient is calculated as follows:

  

The numerator is the product of the covariance, the standard deviation of the denominator two vectors. It is clear that the standard deviation of the two vectors is not zero.

When the linear relationship of two vectors is enhanced, the correlation coefficients tend to be 1 (positive correlation) or 1 (negative correlation). When two variables are independent, the correlation coefficient is 0. On the contrary, it is not established. For example , x obeys a uniform distribution on [ -1,1], at which point E (XY) is 0,e (x) is also 0, so x and Y are obviously not independent. So "irrelevant" and "independent" are two different things. When Y and X obey the joint normal distribution, their mutual independence and irrelevant are equivalent.

For data that is centered (each data is clipped to the sample mean, and their average is 0 after centering),E (X) =e (Y) = 0, at this time:

That is, the correlation coefficient can be regarded as the COS function of the angle of the vector of two random variables.

After further normalization of the x and y vectors, | | x| | =|| y| | =1. A correlation coefficient is the product of two vectors

2.spearman rank correlation coefficient

There are two limitations to using the Pearson Linear correlation factor:

(1) It must be assumed that two vectors must obey the normal distribution

(2) The value is equidistant

For more general cases there are other solutions, and the Spearman rank correlation coefficient is one of them. The Spearman rank correlation coefficient is a non-parametric (distribution-independent) test method used to measure the strength of the linkage between variables. In the absence of duplicate data, if one variable is a strict monotone function of another variable, then the Spearman rank correlation coefficient is either +1 or-1, which is called the variable completely spearman the rank correlation. Note that this is completely related to Pearson: The Pearson is fully correlated only when the two variables are linear, the Pearson correlation coefficient is +1 or-1.

For the original data xi,yi sorted from large to small, remember X ' I,y ' I is the position of the original xi,yi in the sorted list, X ' I,y ' I is called the rank of Xi,yi, rank difference di=x ' i-y ' I. The Spearman rank correlation coefficient is:

Spearman rank correlation coefficient and Pearson Pearson correlation coefficient

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.