Regression analysis example

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Step 1: make the steel consumptionDependent variable Y, The national income isIndependent variable X, Draw a scatter chart based on the data in the table (as shown in ).
The purpose of creating a scatter chart is to select a mathematical regression model intuitively.

Step 2: select an appropriate mathematical regression model. According to the scatter plot in this example, there is a linear correlation between steel consumption and national income. Therefore, we use the linear correlation model as the overall regression model, that is, y = α + β * x + ε (where α beta is the overall regression parameter, which is the theoretical overall value, actually, I don't know. I can only get their estimated values through the sample data, which are represented by A and B respectively. ε is a random item, indicating other influencing factors .)

Step 3: CreateLinear regression equation of one elementAnd use the least square method to estimate the value of α, β, that is, a and B. (A and B are called regression coefficients, where A is the moment of fitting a straight line, and B is the slope of the straight line, which can be done using Excel tools)

A =-460.528180
B = 0.98395935
That is, the regression equation is: y =-460.5282 + 0.9840 X (Note: Y indicates the estimated value, and y indicates the actual value)

Step 4: perform various tests on the model (you can apply the Excel tool. For the calculation results, see the results report)

1,T test-- Significance Test on regression coefficient B
Since regression coefficient B is only an estimate of beta of the overall regression parameter, the test of B can be used to check whether there is a significant difference between the value of beta and 0. If β = 0, it means that there are no x items in the regression equation, indicating that y does not change with X, so there is no linear relationship between Y and X, the assumption of our linear model is not true. Otherwise, if β = 0, There is a linear relationship between Y and X. Our assumption is true. We usually use t tests on B to verify whether y and X have a real linear relationship. That is, the tvalue TB, TB = B/Sb of B is calculated, where Sb is the standard deviation of B, and then according to the pre-set significance level U (usually u = 0.05) and degrees of freedom (D = n-2), query the t distribution table to obtain a critical value of Tu/2, if | TB |> tu/2, it indicates that the probability of regression coefficient β = 0 is less than 0.05, And we can conclude that β = 0, that is, Y has a linear relationship with X. Otherwise, the conclusion is the opposite. Calculated:

| TB | = 19.78057827 tu/2 = 2.131449536
Because | TB |> tu/2, the T test of regression coefficient B is passed,It indicates that the regression coefficient B is significant, that is, the variable national income can explain the change of variable steel consumption..

2,F test-- Test the overall significance of regression equations
It is also a test of whether y and X show a real linear relationship. Based on the variance analysis method, the F statistic of the regression equation is calculated, and according to the given significance level U (usually u = 0.05) and two degrees of freedom (d1 = 1, D2 = n-2), query F distribution table, get the critical value Fu, if F> Fu, this means that an item in the regression model is essential. This shows that the regression effect of the regression equation is remarkable. Calculated :"
F Statistic = 391.2712765 Fu = 4.543077123
Because F> Fu, F test passed,It indicates that the regression effect of the regression equation is remarkable.
(F testing necessity)
If the T statistic is independent and only one coefficient is verified at a time, then:
It makes the probability of rejecting the zero hypothesis 9.75% when the zero hypothesis is true. Compared with the 5% significance plane, this method often rejects the zero hypothesis because of the increase in the rejection field, which makes the rejection field under the zero hypothesis not equal to the expected significance level. If the regression factor is related, the situation is more complex. Therefore, a new method is required to perform the F test on the joint hypothesis of all slope coefficients under the zero hypothesis. "

3, D-W test-is the residual item self-correlation test
If the residual items are not independent of each other, that is, there is a correlation relationship, it cannot indicate the real change relationship between the variable Y in the regression model and the variable Y before X. Because a basic assumption in linear regression is that random items must be independent from each other, otherwise TB computing will increase (because Sb is smaller), thus exaggerating the statistical nature of t-test and F-test, the t-test and F-test are no longer valid. Calculate the D-W statistics, according to the given significance level U (usually u = 0.05), the number of independent variables and the number of sample data N, query the D-W table, get the lower limit DL and upper limit du, only when du <D-W <4-du can indicate that there is no auto-correlation between random items, thus passing the test. (For the calculation process of D-W statistics, see the section in the same color below )"
D-W statistics = sigma (ei-ei-1) 2/SIGMA ei2 = 2.032624524 du (1.38) = (query the D-W statistical table)
Because: du <D-W statistic <4-du, so the D-W test passed, indicating that the residual sequence has no auto correlation, which further shows that t-test and F-test significance is reliable.
4, R2 (Deterministic coefficient) Test-fitting degree determination, that is, the closeness of each observed value of variable Y around the regression line, to show the extent to which variable x Explains variable Y. This test is used to determine the degree of fitting a regression straight line to each observed point. The value of R2 is between 0 and 1. A larger value indicates a higher degree of fitting. Generally, the fit degree is very good when it exceeds 70%. .
R2 = 0.963078857
R2 is close to 1, indicating that the regression line has a high degree of fitting to the sample data points.
5, Estimate the standard error of Y . The smaller the standard error, the better, indicates the degree of discretization between the data point and the regression line.
Standard Error = 135.4581771
In this example, the standard error is much smaller than the average value of the variable Y, so it can be passed.
In this example, the regression equation is used to express the regression relationship between steel consumption and national income.

Step 5: Use regression equations for Prediction

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Regression analysis example

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Regression analysis example

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support