R in Action reading notes (11)-eighth chapter: regression--Selecting the "Best" regression model

Last Update:2015-04-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

8.6 Choosing the "Best" regression model

Comparison of 8.6.1 Models

You can compare the goodness of fit for two nested models with the ANOVA () function in the base installation. The so-called nested model, which is one of its

Items are completely contained in another model

Using the ANOVA () function to compare

> States<-as.data.frame (State.x77[,c ("Murder", "Population", "illiteracy", "Income", "Frost")])

> Fit1<-lm (murder~population+illiteracy+income+frost,data=states)

>FIT2<-LM (Murder~population+illiteracy,data=states)

> Anova (FIT2,FIT1)

Analysis of Variance Table

Model 1:murder ~ Population + Illiteracy

Model 2:murder ~ Population + illiteracy + Income +frost

RES.DF RSS Df Sum of Sq F Pr (>f)

1 47289.25

2 45289.17 2 0.078505 0.0061 0.9939

AIC (akaikeinformation Criterion, Red Pool information guidelines) can also be used to compare models, which takes into account the model's

Statistical fitting and the number of parameters to fit. A model with a smaller AIC value is preferred, which shows that the model uses fewer parameters

Get enough fit.

> AIC (FIT1,FIT2)

DF AIC

Fit1 6 241.6429

Fit2 4 237.6565

8.6.2 Variable Selection

1. Stepwise regression stepwise method

In stepwise regression, the model adds or deletes a variable at a time until a certain stop criterion is reached. Forward

Stepwise regression (forward stepwise) adds a predictor variable to the model each time, until the add variable does not change the model

Input. Backward stepwise regression (backward stepwise) starts with all predictor variables in the model, deleting one variable at a time

Until the model quality is reduced. And forward-backward stepwise regression (stepwise stepwise, often called stepwise regression

), combining the methods of forward stepwise regression and backward stepwise regression, the variables enter one at a time, but each step

, variables are re-evaluated, variables that do not contribute to the model will be deleted, and predictor variables may be added and deleted.

Several times until the optimal model is obtained. The Stepaic () function in the mass package can be implemented

The stepwise regression model (forward, backward, and forward backwards) is based on the precise AIC guidelines.

> Library (MASS)

>FIT1<-LM (Murder~population+illiteracy+income+frost,data=states)

>stepaic (fit1,direction= "backward")

start:aic=97.75

Murder ~ Population +illiteracy + Income + Frost

Df Sum of Sq RSS AIC

-Frost 1 0.021 289.19 95.753

-Income 1 0.057 289.22 95.759

<none> 289.17 97.749

-Population 1 39.238 328.41 102.111

-Illiteracy 1 144.264 433.43 115.986

step:aic=95.75

Murder ~ Population +illiteracy + Income

Df Sum of Sq RSS AIC

-Income 1 0.057 289.25 93.763

<none> 289.19 95.753

-Population 1 43.658332.85 100.783

-Illiteracy 1 236.196 525.38 123.605

step:aic=93.76

Murder ~ Population +illiteracy

Df Sum of Sq RSS AIC

<none> 289.25 93.763

-Population 1 48.517 337.76 99.516

-Illiteracy 1 299.646588.89 127.311

Call:

LM (formula = Murder ~population + illiteracy, data = states)

Coefficients:

(Intercept) Population Illiteracy

1.6515497 0.0002242 4.0807366

2. Full subset Regression

Full subset regression can be implemented using the Regsubsets () function in the leaps package. You can pass r squared, adjust r squared, or

Mallows CP Statistics and other criteria to select the "best" model

> Library ("Leaps", lib.loc= "D:/programfiles/r/r-3.1.3/library")

>leaps<-regsubsets (murder~population+illiteracy+income+frost,data=states,nbest=4)

> Plot (leaps,scal= "ADJR2")

> Library (CAR)

> Subsets (leaps,statistic= "CP", main= "Cpplot for all subsets regression")

> Abline (1,1,lty=2,col= "Red")

8.7 Deep Analysis

8.7.1 Cross-validation

The so-called cross-validation, a certain proportion of the data is selected as training samples, additional samples for the retention of samples, first in

The regression equation is obtained on the training sample, and then the prediction is made on the retention sample. Since the retention sample does not involve the selection of model parameters, the

The sample can obtain a more accurate estimate than the new data. In K-RE cross-validation, the sample is divided into K-sub-samples, and the k?1 sub-sample combination is taken as the training set, and the other 1 sub-samples as the retention set. This results in a K prediction equation, which records the predicted performance of the K-retained samples and averages them. [When n is the total number of observations, K is N, this method, also known as the Crossval () function in the]bootstrap package, can achieve K-RE cross-validation.)

FIT<-LM (Mpg~hp+wt+hp:wt,data=mtcars)

Shrinkage<-function (fit,k=10) {

Require (bootstrap)

Theta.fit<-function (x, y) {lsfit (x, y)}

Theta.predict<-function (fit,x) {cbind (1,x)%*%fit$coef}

X<-fit$model[,2:ncol (Fit$model)]

y<-fit$model[,1]

Results<-crossval (x,y,theta.fit,theta.predict,ngroup=k)

R2<-cor (y,fit$fitted.values) ^2

R2cv<-cor (Y,results$cv.fit) ^2

Cat ("Original r-square=", R2, "\ n")

Cat (K, "fold cross-validated r-square =", R2CV, "\ n")

Cat ("change=", R2-R2CV), "\ n")

}

R in Action reading notes (11)-eighth chapter: regression--Selecting the "Best" regression model

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

R in Action reading notes (11)-eighth chapter: regression--Selecting the "Best" regression model

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

R in Action reading notes (11)-eighth chapter: regression--Selecting the "Best" regression model

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support