Mathematical path-advanced data analysis-multi-variable data analysis (2)

Source: Internet
Author: User

Pearson correlation coefficient, also known as Pearson product-moment correlation coefficient, is a linear correlation coefficient. Pearson correlation coefficient is a statistic used to reflect the linear correlation between two variables. It is used to measure the correlation (linear correlation) between two variables X and Y. Its values are between-1 and 1. The correlation coefficient is represented by R, where N is the sample size, which is the observed value and mean value of the two variables respectively. R describes the degree of linear correlation between two variables. The greater the absolute value of R, the stronger the correlation. Between

Correlation Negative Zheng
None ? From 0.09 to 0.0 From 0.0 to 0.09
Weak ? 0.3? 0.1 From 0.1 to 0.3
Medium ? 0.5? 0.3 From 0.3 to 0.5
Strong ? 1.0? 0.5 From 0.5 to 1.0

Pearson correlation coefficient calculation formula is as follows:

The numerator is the covariance, And the numerator is the product of the standard deviation of two variables. The standard deviation of X and Y is not 0.

Because μX = E (x), σX2 = E [(X? E (x ))2] = E (x2 )? E2 (x), Y is similar, and

Therefore, the correlation coefficient can also be expressed

ForSamplePearson correlation coefficient:

     

The sample correlation coefficient is used to determine whether two variables are correlated in the population. The T-statistic can be used to test the original hypothesis with the population correlation coefficient 0. If the T test is significant, the original hypothesis is rejected, that is, the two variables are linearly correlated. If the T test is not significant, the original hypothesis cannot be rejected, that is, the two variables are not linearly correlated.

Pearson correlation coefficient between two variables is defined as the covariance and standard deviation between two variables:

The above equation definesOverallCorrelation coefficient, which is generally expressed as the Greek letter P (rock ). The covariance and standard deviation are estimated based on the sample.Sample Correlation Coefficient, Generally expressed as R:

Pearson coefficient is symmetric: Corr (x, y) = Corr (Y, X ).

     
The following analyzes the influence of raw materials on the sales volume of a certain food

All content of this blog is original, if reproduced please indicate the source http://blog.csdn.net/myhaspl/

> Read.csv ("H:/docs/machine learning version 2nd/src/abcgoods.csv")-> mygoods
> Mygoods
Raw material a raw material B Raw Material C raw material goods sales
1 0.85 0.12 0.30 4500
2 0.33 0.23 0.44 1800
3 0.64 0.24 0.12 3900
4 0.38 0.12 0.50 1000
5 0.10 0.20 0.88 740
6 0.28 0.17 0.55 990
7 0.15 0.80 0.77 910
8 0.18 0.70 0.75 930
> Cov (mygoods)-> myanalysis. Cov
> Myanalysis. Cov
Raw material a raw material B Raw Material C raw material goods sales
A raw material 0.06716964-0.03539643-0.05832321 368.2161
Raw materials B-0.03539643 0.07230714 0.03521786-151.1464
C Raw Materials-0.05832321 0.03521786 0.06546964-321.9196
Product Sales: 368.21607143-151.14642857-321.91964286
> Cor (mygoods)-> myanalysis. Cor
> Myanalysis. Cor
Raw material a raw material B Raw Material C raw material goods sales
A raw material 1.0000000-0.5079048-0.8794982 0.9501366
Raw materials B-0.5079048 1.0000000 0.5118614-0.3759041
C Raw Materials-0.8794982 0.5118614 1.0000000-0.8413899
Product Sales: 0.9501366-0.3759041-0.8413899
> Cor. Test (~ Raw materials A + raw materials B, Data = mygoods)


Pearson's product-moment correlation


Data: raw materials A and raw materials B
T =-1.4443, df = 6, p-value = 0.1988
Alternative Hypothesis: True correlation is not equal to 0
95 percent confidence interval:
-0.8929757 0.3064479
Sample estimates:
Cor
-0.5079048


> Cor. Test (~ A raw materials + product sales, Data = mygoods)


Pearson's product-moment correlation


Data: a raw materials and product sales
T = 7.4634, df = 6, p-value = 0.0002985
Alternative Hypothesis: True correlation is not equal to 0
95 percent confidence interval:
0.7427838 0.9911796
Sample estimates:
Cor
0.9501366


> Cor. Test (~ C Raw Materials + product sales, Data = mygoods)


Pearson's product-moment correlation


Data: c Raw Materials and product sales
T =-3.8136, df = 6, p-value = 0.008826
Alternative Hypothesis: True correlation is not equal to 0
95 percent confidence interval:
-0.9705934-0.3358354
Sample estimates:
Cor
-0.8413899


> Cor. Test (~ B Raw Materials + product sales, Data = mygoods)


Pearson's product-moment correlation


Data: raw materials and product sales of B
T =-0.9936, df = 6, p-value = 0.3588
Alternative Hypothesis: True correlation is not equal to 0
95 percent confidence interval:
-0.8542858 0.4472372
Sample estimates:
Cor
-0.3759041

Raw materials C and raw materials a are linearly related to the sales volume of commodities respectively.

Raw Material A has no linear relationship with raw material B and does not need to be configured according to the specified ratio.

Mathematical path-advanced data analysis-multi-variable data analysis (2)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.