ISLR 5.3 Lab:cross-validation and the Bootstrap

Source: Internet
Author: User

5.3.1 the Validation Set approach

Sample () function splits the set of observations into and halves, by selecting a random subset of 196 observations out of The original 392 observations. We refer to these observations as the training set.

> Library (ISLR)set. Seed (1)> Train=sample (392 ,196)

We then use the subset option on LM () to fit a linear regression using only the observations corresponding to the training Set.

Lm.fit =lm (Mpg∼horsepower, Data=auto, subset =train)

Use the predict () function to estimate the response for all 392 observations, and we use the mean () function to C Alculate the MSE of the 196 observations in the validation set.

> Attach (Auto)> Mean ((mpg-predict (Lm.fit, Auto)) [-train]^2] [1  26.14142

Use the poly () function to estimate the test error for the polynomial and cubic regressions.

> Lm.fit2=lm (Mpg∼poly (horsepower,2), Data=auto, subset =train)> Mean (mpg-predict (Lm.fit2, Auto)) [-train]^2] [119.82259> Lm.fit3=lm (Mpg∼poly (horsepower,3), Data=auto, subset =train)> Mean ((mpg-predict (LM.FIT3, Auto)) [-train]^2) [1 19.78252

5.3.2 Leave-one-out cross-validation

In this lab, we'll perform linear regression using the GLM () function rather than the LM () function because
The latter can used together with CV.GLM (). The CV.GLM () function is part of the boot library.

> Library (boot)> glm.fit=glm (mpg∼horsepower, data=Auto)> cv.err =cv.glm (Auto, Glm.fit )> cv.err$delta[124.2315124.23114

Our cross-validation estimate for the test error is approximately 24.23.

To automate the process, we use the AS () function to initiate a for Loop which iteratively fits polynomial regres Sions for polynomials of order i = 1 to i = 5, computes the associated cross-validation error, and stores it in  The ith element of the vector cv.error. We begin by initializing the vector.

 for inch 1:5) {+ GLM.FIT=GLM (mpg∼poly (horsepower, i), data=Auto)+ CV.ERROR[I]=CV.GLM ( Auto, Glm.fit) $delta [1]+ }> cv.error[124.23151  19.2482119.3349819.4244319.03321

The trend in Cv.error indicates how CV are used for PRM selection

5.3.3 K-fold cross-validation

The CV.GLM () function can also is used to implement K-fold CV.

> Cv.error. Ten= Rep (0 ,ten)for in1:ten) {+ GLM.FIT=GLM (Mpg∼poly (horsepower, i), data=Auto)+ cv.error. Ten [I]=cv.glm (Auto, Glm.fit, k=) $delta [1]+}

5.3.4 the Bootstrap

Estimating the accuracy of a statistic of Interest

First create a function, Alpha.fn (), which takes as input the (X, Y) data as well as a vector indicating which observation s should is used to estimateα. The function then outputs the estimate forαbased on the selected observations.

Following command tells R to estimateαusing all observations.

> alpha.fn=function (data, index) {+ x=data$x [index]+ y=data$y [index]return (( var (y)-cov (x, Y))/(var(×) +var(y)-2* cov (x, y)))+}

The next command uses the sample () function to randomly select observations from the range 1 to +, with replacement. This is equivalent to constructing a new bootstrap data set and recomputingˆαbased on the new data set.

Alpha.fn (Portfolio, sample ( +, replace =t))

We can implement a bootstrap analysis by performing this command many times, recording all of the corresponding estimates Forα, and computing the resulting standard deviation. However, the boot () function automates this approach. Below we produce R = 1, bootstrap estimates Forα.

> Boot (Portfolio, alpha.fn,r=) Bootstrap Statistics:     original        Bias    std. errort10.5758321 -7.315422e-05  0.08861826

The final output shows that using the original data,ˆα= 0.5758, and that the bootstrap estimate for SE (Shang) is 0.0886.

Estimating the accuracy of a Linear Regression Model

We first create a simple function, Boot.fn (), which takes in the Auto data set as well as a set of indices for the Observa tions, and returns the Intercept

and slope estimates for the linear regression model. We then apply this function to the full set of 392 observations in order to compute the estimates ofβ0 andβ1 on the Enti Re data set using the usual linear regression coefficient estimate formulas from Chapter 3.

> boot.fn=function (data, index)return (Coef (LM (Mpg∼horsepower, data=data, subset =index)) /c2>> Boot.fn (Auto,1:392) (Intercept)  horsepower  39.9358610  - 0.1578447

Next, we use the boot () function to compute the standard errors of $ bootstrap estimates for the Intercept and slope t Erms

> Boot (Auto, Boot.fn,Bootstrap Statistics:      original        bias    Std. errort139.9358610  0.02695630850.859851825T2*-  0.1578447 -0.00029064570.007402954

Below We compute the bootstrap standard error estimates and the standard linear regression estimates this result from Fitt ing the quadratic model to the data.

> boot.fn=function (data, index)+ Coefficients (lm (mpg∼horsepower +i (horsepower ^2), data=data, subset =index))>Set. Seed (1)> Boot (Auto, Boot.fn, +) Ordinary nonparametric bootstrapcall:boot (data= Auto, statistic = Boot.fn, R = +) Bootstrap statistics:original bias std. errort1*56.900099702  6.098115e-03 2.0944855842T2* -0.466189630-1.777108e-04 0.0334123802T3*0.001230536  1.324315e-06 0.0001208339

ISLR 5.3 Lab:cross-validation and the Bootstrap

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.