Regression analysis example

Source: Internet
Author: User

Step 1: make the steel consumptionDependent variable Y, The national income isIndependent variable X, Draw a scatter chart based on the data in the table (as shown in ).
The purpose of creating a scatter chart is to select a mathematical regression model intuitively.

 

Step 2: select an appropriate mathematical regression model. According to the scatter plot in this example, there is a linear correlation between steel consumption and national income. Therefore, we use the linear correlation model as the overall regression model, that is, y = α + β * x + ε (where α beta is the overall regression parameter, which is the theoretical overall value, actually, I don't know. I can only get their estimated values through the sample data, which are represented by A and B respectively. ε is a random item, indicating other influencing factors .)

 

Step 3: CreateLinear regression equation of one elementAnd use the least square method to estimate the value of α, β, that is, a and B. (A and B are called regression coefficients, where A is the moment of fitting a straight line, and B is the slope of the straight line, which can be done using Excel tools)

A =-460.528180
B = 0.98395935
That is, the regression equation is: y =-460.5282 + 0.9840 X (Note: Y indicates the estimated value, and y indicates the actual value)

 

Step 4: perform various tests on the model (you can apply the Excel tool. For the calculation results, see the results report)

1,T test-- Significance Test on regression coefficient B
Since regression coefficient B is only an estimate of beta of the overall regression parameter, the test of B can be used to check whether there is a significant difference between the value of beta and 0. If β = 0, it means that there are no x items in the regression equation, indicating that y does not change with X, so there is no linear relationship between Y and X, the assumption of our linear model is not true. Otherwise, if β = 0, There is a linear relationship between Y and X. Our assumption is true. We usually use t tests on B to verify whether y and X have a real linear relationship. That is, the tvalue TB, TB = B/Sb of B is calculated, where Sb is the standard deviation of B, and then according to the pre-set significance level U (usually u = 0.05) and degrees of freedom (D = n-2), query the t distribution table to obtain a critical value of Tu/2, if | TB |> tu/2, it indicates that the probability of regression coefficient β = 0 is less than 0.05, And we can conclude that β = 0, that is, Y has a linear relationship with X. Otherwise, the conclusion is the opposite. Calculated:

| TB | = 19.78057827 tu/2 = 2.131449536
Because | TB |> tu/2, the T test of regression coefficient B is passed,It indicates that the regression coefficient B is significant, that is, the variable national income can explain the change of variable steel consumption..

2,F test-- Test the overall significance of regression equations
It is also a test of whether y and X show a real linear relationship. Based on the variance analysis method, the F statistic of the regression equation is calculated, and according to the given significance level U (usually u = 0.05) and two degrees of freedom (d1 = 1, D2 = n-2), query F distribution table, get the critical value Fu, if F> Fu, this means that an item in the regression model is essential. This shows that the regression effect of the regression equation is remarkable. Calculated :"
F Statistic = 391.2712765 Fu = 4.543077123
Because F> Fu, F test passed,It indicates that the regression effect of the regression equation is remarkable.
(F testing necessity)
If the T statistic is independent and only one coefficient is verified at a time, then:
It makes the probability of rejecting the zero hypothesis 9.75% when the zero hypothesis is true. Compared with the 5% significance plane, this method often rejects the zero hypothesis because of the increase in the rejection field, which makes the rejection field under the zero hypothesis not equal to the expected significance level. If the regression factor is related, the situation is more complex. Therefore, a new method is required to perform the F test on the joint hypothesis of all slope coefficients under the zero hypothesis. "

3, D-W test-is the residual item self-correlation test
If the residual items are not independent of each other, that is, there is a correlation relationship, it cannot indicate the real change relationship between the variable Y in the regression model and the variable Y before X. Because a basic assumption in linear regression is that random items must be independent from each other, otherwise TB computing will increase (because Sb is smaller), thus exaggerating the statistical nature of t-test and F-test, the t-test and F-test are no longer valid. Calculate the D-W statistics, according to the given significance level U (usually u = 0.05), the number of independent variables and the number of sample data N, query the D-W table, get the lower limit DL and upper limit du, only when du <D-W <4-du can indicate that there is no auto-correlation between random items, thus passing the test. (For the calculation process of D-W statistics, see the section in the same color below )"
D-W statistics = sigma (ei-ei-1) 2/SIGMA ei2 = 2.032624524 du (1.38) = (query the D-W statistical table)
Because: du <D-W statistic <4-du, so the D-W test passed, indicating that the residual sequence has no auto correlation, which further shows that t-test and F-test significance is reliable.
4, R2 (Deterministic coefficient) Test-fitting degree determination, that is, the closeness of each observed value of variable Y around the regression line, to show the extent to which variable x Explains variable Y. This test is used to determine the degree of fitting a regression straight line to each observed point. The value of R2 is between 0 and 1. A larger value indicates a higher degree of fitting. Generally, the fit degree is very good when it exceeds 70%. .
R2 = 0.963078857
R2 is close to 1, indicating that the regression line has a high degree of fitting to the sample data points.
5, Estimate the standard error of Y . The smaller the standard error, the better, indicates the degree of discretization between the data point and the regression line.
Standard Error = 135.4581771
In this example, the standard error is much smaller than the average value of the variable Y, so it can be passed.
In this example, the regression equation is used to express the regression relationship between steel consumption and national income.

 

Step 5: Use regression equations for Prediction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.