R in Action reading notes (11)-eighth chapter: regression--Selecting the "Best" regression model

Source: Internet
Author: User

8.6 Choosing the "Best" regression model

Comparison of 8.6.1 Models

You can compare the goodness of fit for two nested models with the ANOVA () function in the base installation. The so-called nested model, which is one of its

Items are completely contained in another model

Using the ANOVA () function to compare

> States<-as.data.frame (State.x77[,c ("Murder", "Population", "illiteracy", "Income", "Frost")])

> Fit1<-lm (murder~population+illiteracy+income+frost,data=states)

>FIT2<-LM (Murder~population+illiteracy,data=states)

> Anova (FIT2,FIT1)

Analysis of Variance Table

Model 1:murder ~ Population + Illiteracy

Model 2:murder ~ Population + illiteracy + Income +frost

RES.DF RSS Df Sum of Sq F Pr (>f)

1 47289.25

2 45289.17 2 0.078505 0.0061 0.9939

AIC (akaikeinformation Criterion, Red Pool information guidelines) can also be used to compare models, which takes into account the model's

Statistical fitting and the number of parameters to fit. A model with a smaller AIC value is preferred, which shows that the model uses fewer parameters

Get enough fit.

> AIC (FIT1,FIT2)

DF AIC

Fit1 6 241.6429

Fit2 4 237.6565

8.6.2 Variable Selection

1. Stepwise regression stepwise method

In stepwise regression, the model adds or deletes a variable at a time until a certain stop criterion is reached. Forward

Stepwise regression (forward stepwise) adds a predictor variable to the model each time, until the add variable does not change the model

Input. Backward stepwise regression (backward stepwise) starts with all predictor variables in the model, deleting one variable at a time

Until the model quality is reduced. And forward-backward stepwise regression (stepwise stepwise, often called stepwise regression

), combining the methods of forward stepwise regression and backward stepwise regression, the variables enter one at a time, but each step

, variables are re-evaluated, variables that do not contribute to the model will be deleted, and predictor variables may be added and deleted.

Several times until the optimal model is obtained. The Stepaic () function in the mass package can be implemented

The stepwise regression model (forward, backward, and forward backwards) is based on the precise AIC guidelines.

> Library (MASS)

>FIT1<-LM (Murder~population+illiteracy+income+frost,data=states)

>stepaic (fit1,direction= "backward")

start:aic=97.75

Murder ~ Population +illiteracy + Income + Frost

Df Sum of Sq RSS AIC

-Frost 1 0.021 289.19 95.753

-Income 1 0.057 289.22 95.759

<none> 289.17 97.749

-Population 1 39.238 328.41 102.111

-Illiteracy 1 144.264 433.43 115.986

step:aic=95.75

Murder ~ Population +illiteracy + Income

Df Sum of Sq RSS AIC

-Income 1 0.057 289.25 93.763

<none> 289.19 95.753

-Population 1 43.658332.85 100.783

-Illiteracy 1 236.196 525.38 123.605

step:aic=93.76

Murder ~ Population +illiteracy

Df Sum of Sq RSS AIC

<none> 289.25 93.763

-Population 1 48.517 337.76 99.516

-Illiteracy 1 299.646588.89 127.311

Call:

LM (formula = Murder ~population + illiteracy, data = states)

Coefficients:

(Intercept) Population Illiteracy

1.6515497 0.0002242 4.0807366

2. Full subset Regression

Full subset regression can be implemented using the Regsubsets () function in the leaps package. You can pass r squared, adjust r squared, or

Mallows CP Statistics and other criteria to select the "best" model

> Library ("Leaps", lib.loc= "D:/programfiles/r/r-3.1.3/library")

>leaps<-regsubsets (murder~population+illiteracy+income+frost,data=states,nbest=4)

> Plot (leaps,scal= "ADJR2")

> Library (CAR)

> Subsets (leaps,statistic= "CP", main= "Cpplot for all subsets regression")

> Abline (1,1,lty=2,col= "Red")

8.7 Deep Analysis

8.7.1 Cross-validation

The so-called cross-validation, a certain proportion of the data is selected as training samples, additional samples for the retention of samples, first in

The regression equation is obtained on the training sample, and then the prediction is made on the retention sample. Since the retention sample does not involve the selection of model parameters, the

The sample can obtain a more accurate estimate than the new data. In K-RE cross-validation, the sample is divided into K-sub-samples, and the k?1 sub-sample combination is taken as the training set, and the other 1 sub-samples as the retention set. This results in a K prediction equation, which records the predicted performance of the K-retained samples and averages them. [When n is the total number of observations, K is N, this method, also known as the Crossval () function in the]bootstrap package, can achieve K-RE cross-validation.)

FIT<-LM (Mpg~hp+wt+hp:wt,data=mtcars)

Shrinkage<-function (fit,k=10) {

Require (bootstrap)

Theta.fit<-function (x, y) {lsfit (x, y)}

Theta.predict<-function (fit,x) {cbind (1,x)%*%fit$coef}

X<-fit$model[,2:ncol (Fit$model)]

y<-fit$model[,1]

Results<-crossval (x,y,theta.fit,theta.predict,ngroup=k)

R2<-cor (y,fit$fitted.values) ^2

R2cv<-cor (Y,results$cv.fit) ^2

Cat ("Original r-square=", R2, "\ n")

Cat (K, "fold cross-validated r-square =", R2CV, "\ n")

Cat ("change=", R2-R2CV), "\ n")

}

R in Action reading notes (11)-eighth chapter: regression--Selecting the "Best" regression model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.