R implements multiple linear regression. LM () functions are used to familiarize themselves with other statistical regression functions, which is also helpful for regression analysis.
- ANOVA (m): ANOVA table
- Coefficients (m): coefficient of the Model
- Coef (m): Same as coefficients (m)
- Confint (m): confidence interval of the regression coefficient
- Deviance (m): Sum of the residual squares
- Effects (m): vector of orthogonal Effects)
- Fitted (m): fitted y value vector of fitted y values
- Residuals (m): Model residual model residuals
- Resid (m): Same as residuals (m)
- Summary (m): Key statistic, such as R2, F statistic, and residual standard deviation (σ)
- Vcov (m)
The following are the basic steps for R to perform multiple linear regression: 1. read data, R-STUDIO has a button directly, otherwise> zsj <-read.csv ("D:/paper/data/zsj.csv") data is generally read from the CSV or TXT excel, the structure PS1 is organized to conform to the data frame structure of R: There are many packages that provide methods to read data from different sources, so I have to learn it slowly .. 2. Select the form of regression equation> plot (Y ~ X1); abline (LM (Y ~ X1)> plot (Y ~ X2); abline (LM (Y ~ X2 ))
Obviously, the relationship between X1 and Y is linear, and X2 is similar to this. 3. Perform regression and check the regression result> LM. Test <-LM (Y ~ X1 + X2, Data = zsj)> Summary (LM. Test) Call: LM (formula = Y ~ X1 + X2, Data = zsj) Residuals: min 1q median 3q max-0.21286-0.05896-0.01450 0.05556 0.30795 coefficients: Estimate STD. error T value PR (> | T |) (intercept) 0.0931750 0.0109333 8.522 5.85e-16 *** X1 0.0109935 0.0003711 29.625 <2e-16 *** X2 0.0099941 0.0010459 9.555 <2e-16 *** --- signif. codes: 0 '*** '000000' ** '000000' * '000000 '. '000000' 1 residual standard error: 0.1 on 0.08109 degrees of freedommultiple R-squared: 327, adjusted R-squared: 0.7953 F-statistic: 0.7941 on 2 and 635.3 DF, p-value: <2.2e-16 shows that all significance tests are successful. 4. remove exception points> plot (LM. test, which = 1:4)
The four images are shown in sequence as follows: 4.1 common residual and fit value residual diagram 4.2 normal QQ residual diagram (if the residual is a sample from normal population distribution, then, the points in the QQ plot should be in a straight line.) 4.3 The Residual code of the standard residual and the Residual code of the fitting value (for the standard Residual code that follows the normal distribution, 95% of the sample points should fall within the range. This is also an intuitive way to determine the exception points.) The result of the cook statistic (the larger the cook statistic value, the more likely it is to be an abnormal value, but the more difficult it is to determine the specific threshold value) can be seen, 54,65, 295 three samples are abnormal and need to be removed. 5. test the 5.1 gqtest and H0 variance (the intersection of the square of the error and the independent variable and the sum of the square of the independent variable is irrelevant). The P value rejects H0 for an hour and considers the appeal formula to be correlated, there is an variance> res. test <-residuals (LM. test)> Library (lmtest)> gqtest (LM. test) goldfeld-quandt test data: LM. test GQ = 0.9353, DF1 = 162, df2 = 162, p-value = 0.6647 5.2 bptest, H0 (same-Side Deviation), P values are considered to have an abnormal variance> bptest (LM. test) studentized breusch-pagan test data: LM. test BP = 3.0757, df = 2, P-value = 0.2148 both tests show that the variance does not exist. However, to sum up all the situations, we have made some corrections here .. 6. select fgls as the method for correcting the variance correction. The generalized least square is used to perform the correction step 6.1. 6.1.1. Return y to XI and calculate res -- u6.1.2 to calculate log (U ^ 2) 6.1.3 perform log (U ^ 2) Secondary regression log (U ^ 2) on Xi to obtain the fitting function g = b0 + b1x1 + .. + b2x26.1.4 calculate the fitting weight 1/h = 1/exp (G), and use this to make WLS estimation> lm. test2 <-LM (log (resid (LM. test) ^ 2 )~ X1 + X2, Data = zsj)> LM. test3 <-LM (Y ~ X1 + X2, weights = 1/exp (fitted (LM. test2), data = zsj)> Summary (LM. test3) I will not post the regression result here. 7. test the multi-linearity 7.1 calculation of the condition number K for the interpretation variable-related sparse matrix, k <100 the degree of Multi-linearity is small, 100 <k <1000 strong,> 1000 severe> XX <-cor (zsj [])> Kappa (XX) [1] 2.223986 7.2 look for a combination of strongly linear interpreted variables> eigen (XX) # used to find a combination of Strongly-linear interpreted variables # $ values [1] 1.3129577 0.6870423 $ vectors [, 1] [, 2] [1,] 0.7071068-0.7071068 [2,] 0.7071068 0.7071068 8. corrected multiple collinearity-step regression> step (LM. test) Start: AIC =-1655.03y ~ X1 + x2 DF sum of sq rss aic <none> 2.1504-1655.0-X2 1 0.6005-2.7509-X1 1 1575.8-5.7714 call: LM (formula = Y ~ X1 + X2, Data = zsj) coefficients :( intercept) x1 X2 0.093175 0.010994 0.009994 visible X2. When X1 is not removed, the AIC value is the minimum and the model is the best. PS2: You can set parameters in step: Direction = C ("both", "forward", "backward") to select the direction of gradual regression. The default value is both, when forward is used, two numbers are added and backward is opposite.
From: http://blog.sina.com.cn/s/blog_6ee39c3901017fpd.html
Other references:
Using lm for Nonlinear Fitting in R
Baidu Library Multiple Regression
R learning multivariate linear regression analysis
R entry 25 (20th ~ 24 strokes)