Pearson correlation coefficient, also known as Pearson product-moment correlation coefficient, is a linear correlation coefficient. Pearson correlation coefficient is a statistic used to reflect the linear correlation between two variables. It is used to measure the correlation (linear correlation) between two variables X and Y. Its values are between-1 and 1. The correlation coefficient is represented by R, where N is the sample size, which is the observed value and mean value of the two variables respectively. R describes the degree of linear correlation between two variables. The greater the absolute value of R, the stronger the correlation. Between
Correlation |
Negative |
Zheng |
None |
? From 0.09 to 0.0 |
From 0.0 to 0.09 |
Weak |
? 0.3? 0.1 |
From 0.1 to 0.3 |
Medium |
? 0.5? 0.3 |
From 0.3 to 0.5 |
Strong |
? 1.0? 0.5 |
From 0.5 to 1.0 |
Pearson correlation coefficient calculation formula is as follows:
The numerator is the covariance, And the numerator is the product of the standard deviation of two variables. The standard deviation of X and Y is not 0.
Because μX = E (x), σX2 = E [(X? E (x ))2] = E (x2 )? E2 (x), Y is similar, and
Therefore, the correlation coefficient can also be expressed
ForSamplePearson correlation coefficient:
The sample correlation coefficient is used to determine whether two variables are correlated in the population. The T-statistic can be used to test the original hypothesis with the population correlation coefficient 0. If the T test is significant, the original hypothesis is rejected, that is, the two variables are linearly correlated. If the T test is not significant, the original hypothesis cannot be rejected, that is, the two variables are not linearly correlated.
Pearson correlation coefficient between two variables is defined as the covariance and standard deviation between two variables:
The above equation definesOverallCorrelation coefficient, which is generally expressed as the Greek letter P (rock ). The covariance and standard deviation are estimated based on the sample.Sample Correlation Coefficient, Generally expressed as R:
Pearson coefficient is symmetric: Corr (x, y) = Corr (Y, X ).
The following analyzes the influence of raw materials on the sales volume of a certain food
All content of this blog is original, if reproduced please indicate the source http://blog.csdn.net/myhaspl/
> Read.csv ("H:/docs/machine learning version 2nd/src/abcgoods.csv")-> mygoods
> Mygoods
Raw material a raw material B Raw Material C raw material goods sales
1 0.85 0.12 0.30 4500
2 0.33 0.23 0.44 1800
3 0.64 0.24 0.12 3900
4 0.38 0.12 0.50 1000
5 0.10 0.20 0.88 740
6 0.28 0.17 0.55 990
7 0.15 0.80 0.77 910
8 0.18 0.70 0.75 930
> Cov (mygoods)-> myanalysis. Cov
> Myanalysis. Cov
Raw material a raw material B Raw Material C raw material goods sales
A raw material 0.06716964-0.03539643-0.05832321 368.2161
Raw materials B-0.03539643 0.07230714 0.03521786-151.1464
C Raw Materials-0.05832321 0.03521786 0.06546964-321.9196
Product Sales: 368.21607143-151.14642857-321.91964286
> Cor (mygoods)-> myanalysis. Cor
> Myanalysis. Cor
Raw material a raw material B Raw Material C raw material goods sales
A raw material 1.0000000-0.5079048-0.8794982 0.9501366
Raw materials B-0.5079048 1.0000000 0.5118614-0.3759041
C Raw Materials-0.8794982 0.5118614 1.0000000-0.8413899
Product Sales: 0.9501366-0.3759041-0.8413899
> Cor. Test (~ Raw materials A + raw materials B, Data = mygoods)
Pearson's product-moment correlation
Data: raw materials A and raw materials B
T =-1.4443, df = 6, p-value = 0.1988
Alternative Hypothesis: True correlation is not equal to 0
95 percent confidence interval:
-0.8929757 0.3064479
Sample estimates:
Cor
-0.5079048
> Cor. Test (~ A raw materials + product sales, Data = mygoods)
Pearson's product-moment correlation
Data: a raw materials and product sales
T = 7.4634, df = 6, p-value = 0.0002985
Alternative Hypothesis: True correlation is not equal to 0
95 percent confidence interval:
0.7427838 0.9911796
Sample estimates:
Cor
0.9501366
> Cor. Test (~ C Raw Materials + product sales, Data = mygoods)
Pearson's product-moment correlation
Data: c Raw Materials and product sales
T =-3.8136, df = 6, p-value = 0.008826
Alternative Hypothesis: True correlation is not equal to 0
95 percent confidence interval:
-0.9705934-0.3358354
Sample estimates:
Cor
-0.8413899
> Cor. Test (~ B Raw Materials + product sales, Data = mygoods)
Pearson's product-moment correlation
Data: raw materials and product sales of B
T =-0.9936, df = 6, p-value = 0.3588
Alternative Hypothesis: True correlation is not equal to 0
95 percent confidence interval:
-0.8542858 0.4472372
Sample estimates:
Cor
-0.3759041
Raw materials C and raw materials a are linearly related to the sales volume of commodities respectively.
Raw Material A has no linear relationship with raw material B and does not need to be configured according to the specified ratio.
Mathematical path-advanced data analysis-multi-variable data analysis (2)