R language: Multivariate linear regression and model checking

Source: Internet
Author: User

Multivariate linear regression research using Swiss data sets


# First Look at the scatter plot between the variables
Pairs (Swiss, panel = panel.smooth, main = "Swiss data",
Col = 3 + (Swiss$catholic > 50))




# build multivariate linear regression with all variables
A=LM (fertility ~., data = Swiss)
Summary (a)
##
# # Call:
# LM (formula = Fertility ~., data = Swiss)
##
# # Residuals:
# # Min 1Q Median 3Q Max
# # -15.274-5.262 0.503 4.120 15.321
##
# # coefficients:
# # Estimate STD. Error t value Pr (>|t|)
# # (Intercept) 66.9152 10.7060 6.25 1.9e-07 * * *
# # Agriculture-0.1721 0.0703-2.45 0.0187 *
# # Examination-0.2580 0.2539-1.02 0.3155
# # Education-0.8709 0.1830-4.76 2.4e-05 * * *
# # Catholic 0.1041 0.0353 2.95 0.0052 * *
# # infant.mortality 1.0770 0.3817 2.82 0.0073 * *
## ---
# # signif. codes:0 ' * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 "1
##
# # Residual standard error:7.17 on degrees of freedom
# # Multiple r-squared:0.707, adjusted r-squared:0.671
# # f-statistic:19.8 on 5 and DF, p-value:5.59e-10
# from the results, the P-value of the education variable is not a single star, indicating that the model is very insignificant.
The Add1 DROP1 function is provided in R to increase or decrease the variables of the linear model.
DROP1 (a)
# deletions
##
# # Model:
# # Fertility ~ Agriculture + examination + Education + Catholic +
# # Infant.mortality
# # Df Sum of Sq RSS AIC
# # <none> 2105 191
# # agriculture 1 308 2413 195
# # examination 1 53 2158 190
# # education 1 1163 3268 209
# # catholic 1 448 2553 198
# # infant.mortality 1 409 2514 197
# from the result, the AIC is the smallest after removing the education variable, so the next step is to reject the variable for modeling.
B=update (a,.~.-education)
Summary (b)
##
# # Call:
# LM (formula = Fertility ~ Agriculture + examination + Catholic +
# # infant.mortality, data = Swiss)
##
# # Residuals:
# # Min 1Q Median 3Q Max
# # -23.919-3.553-0.649 6.596 14.177
##
# # coefficients:
# # Estimate STD. Error t value Pr (>|t|)
# # (Intercept) 59.6027 13.0425 4.57 4.2e-05 * * *
# # Agriculture-0.0476 0.0803-0.59 0.55669
# # Examination-0.9680 0.2528-3.83 0.00042 * * *
# # Catholic 0.0261 0.0384 0.68 0.50055
# # Infant.mortality 1.3960 0.4626 3.02 0.00431 * *
## ---
# # signif. codes:0 ' * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 "1
##
# # Residual standard error:8.82 on degrees of freedom
# # Multiple r-squared:0.545, adjusted r-squared:0.501
# # f-statistic:12.6 on 4 and DF, p-value:8.27e-07
#从接下来的结果看, there are two variables are not significant, R-Squared is only 0.53, the model effect is very unsatisfactory. Further research is needed.
# Fortunately, R has a step function that can be used to automatically filter the model, based on the AIC minimum principle
B=step (a,direction= "backward")
# # start:aic=190.7
# # Fertility ~ Agriculture + examination + Education + Catholic +
# # Infant.mortality
##
# # Df Sum of Sq RSS AIC
# #-Examination 1 53 2158 190
# # <none> 2105 191
# #-Agriculture 1 308 2413 195
# #-Infant.mortality 1 409 2514 197
# #-Catholic 1 448 2553 198
# #-Education 1 1163 3268 209
##
# # step:aic=189.9
# # Fertility ~ Agriculture + education + Catholic + infant.mortality
##
# # Df Sum of Sq RSS AIC
# # <none> 2158 190
# #-Agriculture 1 264 2422 193
# #-Infant.mortality 1 410 2568 196
# #-Catholic 1 957 3115 205
# #-Education 1 2250 4408 221
Summary (b)
##
# # Call:
# LM (formula = Fertility ~ Agriculture + education + Catholic +
# # infant.mortality, data = Swiss)
##
# # Residuals:
# # Min 1Q Median 3Q Max
# # -14.676-6.052 0.751 3.166 16.142
##
# # coefficients:
# # Estimate STD. Error t value Pr (>|t|)
# # (Intercept) 62.1013 9.6049 6.47 8.5e-08 * * *
# # Agriculture-0.1546 0.0682-2.27 0.0286 *
# # Education-0.9803 0.1481-6.62 5.1e-08 * * *
# # Catholic 0.1247 0.0289 4.31 9.5e-05 * * *
# # infant.mortality 1.0784 0.3819 2.82 0.0072 * *
## ---
# # signif. codes:0 ' * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 "1
##
# # Residual standard error:7.17 on degrees of freedom
# # Multiple r-squared:0.699, adjusted r-squared:0.671
# # f-statistic:24.4 on 4 and DF, p-value:1.72e-10
Next, we study the regression diagnosis of modeling variables and models.


First, the test of the self-variable is normal


Shapiro.test (Swiss$agriculture)
##
# # Shapiro-wilk Normality test
##
# # Data:swiss$agriculture
# # W = 0.9664, P-value = 0.193
Shapiro.test (swiss$examination)
##
# # Shapiro-wilk Normality test
##
# # Data:swiss$examination
# # W = 0.9696, P-value = 0.2563
Shapiro.test (swiss$education)
##
# # Shapiro-wilk Normality test
##
# # Data:swiss$education
# # W = 0.7482, P-value = 1.312e-07
Shapiro.test (Swiss$catholic)
##
# # Shapiro-wilk Normality test
##
# # Data:swiss$catholic
# # W = 0.7463, P-value = 1.205e-07
Shapiro.test (swiss$infant.mortality)
##
# # Shapiro-wilk Normality test
##
# # Data:swiss$infant.mortality
# # W = 0.9776, P-value = 0.4978
According to the test results of the normality of each variable, the P-value of variable education and Catholic is less than 0.05, so the data of these two variables do not conform to the normal distribution.


Now, the residuals of the model are also tested for normality (the residuals of the regression model also conform to the normal distribution)


B.res<-residuals (b)
Shapiro.test (B.res)
##
# # Shapiro-wilk Normality test
##
# # Data:b.res
# # W = 0.9766, P-value = 0.459
From the results, the P-value is 0.459 and the model residuals conform to the normal distribution


Next, draw the residual plot of the regression values and residuals (which should conform to the uniform distribution, that is, the residuals have the same distribution regardless of the regression value)


Par (MFROW=C)
# Draw a residual plot
Plot (B.res~predict (b))
# Draw a standard residual plot
Plot (Rstandard (b) ~predict (b))




Par (mfrow=c ())
Judging from the residual plot, the effect is less obvious.


In fact, you can draw the residual map directly


Par (Mfrow=c (2,2))
Plot (b)




Par (mfrow=c ())


R language: Multivariate linear regression and model checking

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.