ISLR Chapter III Application of Linear regression exercises answer (bottom)

Source: Internet
Author: User

Islr;r language; machine learning; linear regression

Some professional vocabulary only know English, Chinese may not standard, please light spray

12. Simple linear regression with no intercept
A) Observe the 3.38-type discoverable


When the sum of x^2 is equal to the sum of y^2, the same parameters are estimated.
b

set.seed(1)x=rnorm(100)y=2*xlm.fit=lm(y~x+0)lm.fit2=lm(x~y+0)summary(lm.fit)

Output Result:

Call:lm(formula = y ~ x + 0)Residuals:       Min         1Q     Median         3Q        Max -3.776e-16 -3.378e-17  2.680e-18  6.113e-17  5.105e-16 Coefficients:   Estimate Std. Error   t value Pr(>|t|)    x 2.000e+00  1.296e-17 1.543e+17   <2e-16 ***---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 1.167e-16 on 99 degrees of freedomMultiple R-squared:      1,     Adjusted R-squared:      1 F-statistic: 2.382e+34 on 1 and 99 DF,  p-value: < 2.2e-16

Linear regression 2:

summary(lm.fit2)

Output Result:

Call:lm(formula = x ~ y + 0)Residuals:       Min         1Q     Median         3Q        Max -1.888e-16 -1.689e-17  1.339e-18  3.057e-17  2.552e-16 Coefficients:  Estimate Std. Error   t value Pr(>|t|)    y 5.00e-01   3.24e-18 1.543e+17   <2e-16 ***---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 5.833e-17 on 99 degrees of freedomMultiple R-squared:      1,     Adjusted R-squared:      1 F-statistic: 2.382e+34 on 1 and 99 DF,  p-value: < 2.2e-16

The results show that the regression parameters are different
C
The sample () function is able to sample randomly from a specified collection of objects, by specifying the vector x of a class of objects, and then sampling the size from it.
For example, sampling from integers 1 to 10 and extracting 4 numbers from the sample (1:10, 4)
, get 3, 4, 5, 7. If you do it again, you get 3, 9, 8, 5. Because you choose not to put back the sample, you do not get a duplicate number.

 > set.seed(1) > x=rnorm(100) > y=sample(x,100) > sum(x^2) [1] 81.05509 > sum(y^2) [1] 81.05509 > lm.fit=lm(y~x+0) > lm.fit2=lm(x~y+0) > summary(lm.fit)

Output Result:

 Call: lm(formula = y ~ x + 0) Residuals:     Min      1Q  Median      3Q     Max  -2.2315 -0.5124  0.1027  0.6877  2.3926  Coefficients:   Estimate Std. Error t value Pr(>|t|) x  0.02148    0.10048   0.214    0.831 Residual standard error: 0.9046 on 99 degrees of freedom Multiple R-squared:  0.0004614, Adjusted R-squared:  -0.009635  F-statistic: 0.0457 on 1 and 99 DF,  p-value: 0.8312

Linear regression 2:

 Call: lm(formula = x ~ y + 0) Residuals:     Min      1Q  Median      3Q     Max  -2.2400 -0.5154  0.1213  0.6788  2.3959  Coefficients:   Estimate Std. Error t value Pr(>|t|) y  0.02148    0.10048   0.214    0.831 Residual standard error: 0.9046 on 99 degrees of freedom Multiple R-squared:  0.0004614, Adjusted R-squared:  -0.009635  F-statistic: 0.0457 on 1 and 99 DF,  p-value: 0.8312

The results show that the linear regression parameters are equal when the sum of x^2 and y^2 are equal.

13.
A

> set.seed(1)> x=rnorm(100)

b

> eps=rnorm(100,0,sqrt(0.25))

C

> y=-1+0.5*x+eps

Y vector length is 100;β0=-1;β1=0.5
D

> plot(x,y)


X and Y are observed to be linear, and the slope is greater than 0.
E

> lm.fit=lm(y~x)> summary(lm.fit)

Output results

Call:lm(formula = y ~ x)Residuals:     Min       1Q   Median       3Q      Max -0.93842 -0.30688 -0.06975  0.26970  1.17309 Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept) -1.01885    0.04849 -21.010  < 2e-16 ***x            0.49947    0.05386   9.273 4.58e-15 ***---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.4814 on 98 degrees of freedomMultiple R-squared:  0.4674,    Adjusted R-squared:  0.4619 F-statistic: 85.99 on 1 and 98 DF,  p-value: 4.583e-15

β -0=-1.01885,β -1=0.49947 is similar to β0=-1;β1=0.5, and P-values close to 0 indicate a statistically significant relationship.
F

> plot(x,y)> abline(lm.fit,lwd=3,col="red")> abline(-1,0.5,lwd=3,col="green")> legend(-1,legend=c("model fit", "pop regression"),col=2:3,lwd=3)


G

> lm.fit2=lm(y~x+I(x^2))> summary(lm.fit2)

Output Result:

Call:lm(formula = y ~ x + I(x^2))Residuals:     Min       1Q   Median       3Q      Max -0.98252 -0.31270 -0.06441  0.29014  1.13500 Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept) -0.97164    0.05883 -16.517  < 2e-16 ***x            0.50858    0.05399   9.420  2.4e-15 ***I(x^2)      -0.05946    0.04238  -1.403    0.164    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.479 on 97 degrees of freedomMultiple R-squared:  0.4779,    Adjusted R-squared:  0.4672 F-statistic:  44.4 on 2 and 97 DF,  p-value: 2.038e-14

R^2 and RSE only a faint increase, x^2 t value of 0.164 shows no statistically significant relationship between Y and x^2
H

> set.seed(1)> esp1=rnorm(100,0,sqrt(0.125))> y1=-1+0.5*x + esp1> plot(x,y1)> lm.fit1=lm(y1~x)> summary(lm.fit1)

Output Result:

Call:lm(formula = y1 ~ x)Residuals:     Min       1Q   Median       3Q      Max -0.66356 -0.21700 -0.04932  0.19071  0.82950 Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept) -1.01333    0.03429  -29.55   <2e-16 ***x            0.49963    0.03809   13.12   <2e-16 ***---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.3404 on 98 degrees of freedomMultiple R-squared:  0.6371,    Adjusted R-squared:  0.6334 F-statistic: 172.1 on 1 and 98 DF,  p-value: < 2.2e-16

Drawing:

> abline(lm.fit1,lwd=3,col=2)> abline(-1,0.5,lwd=3,col=3)> legend(-1,legend=c("model fit","pop. regression"),col=2:3,lwd=3)


RSE Reduction
I

> esp2=rnorm(100,0,sqrt(0.5))> y2=-1+0.5*x + esp2> plot(x,y2)> lm.fit2=lm(y2~x)> summary(lm.fit2)

Output Result:

Call:lm(formula = y2 ~ x)Residuals:     Min       1Q   Median       3Q      Max -2.06059 -0.34104 -0.03205  0.45908  1.86787 Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept) -0.98065    0.07404 -13.245  < 2e-16 ***x            0.51497    0.08224   6.262 1.01e-08 ***---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.7349 on 98 degrees of freedomMultiple R-squared:  0.2858,    Adjusted R-squared:  0.2785 F-statistic: 39.21 on 1 and 98 DF,  p-value: 1.01e-08

Drawing:

Abline (lm.fit2,lwd=3,col=2)
Abline ( -1,0.5,lwd=3,col=3)
Legend ( -1,legend=c ("model Fit", "pop. Regression"), col=2:3,lwd=3)


RSE Increase
J

> confint(lm.fit)                 2.5 %     97.5 %(Intercept) -1.1150804 -0.9226122x            0.3925794  0.6063602> confint(lm.fit1)                 2.5 %     97.5 %(Intercept) -1.0813741 -0.9452786x            0.4240422  0.5752080> confint(lm.fit2)                 2.5 %     97.5 %(Intercept) -1.1275711 -0.8337236x            0.3517741  0.6781604

The greater the noise, the greater the confidence interval.

14.
A


β0=2;β1=2;β2=0.3;
b

> cor(x1,x2)[1] 0.8351212> plot(x1,x2)


C

> lm.fit=lm(y~x1+x2)> summary(lm.fit)Call:lm(formula = y ~ x1 + x2)Residuals:    Min      1Q  Median      3Q     Max -2.8311 -0.7273 -0.0537  0.6338  2.3359 Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)   2.1305     0.2319   9.188 7.61e-15 ***x1            1.4396     0.7212   1.996   0.0487 *  x2            1.0097     1.1337   0.891   0.3754    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 1.056 on 97 degrees of freedomMultiple R-squared:  0.2088,    Adjusted R-squared:  0.1925 F-statistic:  12.8 on 2 and 97 DF,  p-value: 1.164e-05

Beta 0=2.1305; Beta 1=1.4396; Beta ? 2=1.0097
β0=2;β1=2;β2=0.3;
Because the T value is too large, we can not reject the hypothesis that β2 = 0
D

> lm.fit1=lm(y~x1)> summary(lm.fit1)Call:lm(formula = y ~ x1)Residuals:     Min       1Q   Median       3Q      Max -2.89495 -0.66874 -0.07785  0.59221  2.45560 Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)   2.1124     0.2307   9.155 8.27e-15 ***x1            1.9759     0.3963   4.986 2.66e-06 ***---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 1.055 on 98 degrees of freedomMultiple R-squared:  0.2024,    Adjusted R-squared:  0.1942 F-statistic: 24.86 on 1 and 98 DF,  p-value: 2.661e-06

Because P-values close to 0 can be rejected h*0: β*1 = 0 hypothesis
E

> lm.fit2=lm(y~x2)> summary(lm.fit2)Call:lm(formula = y ~ x2)Residuals:     Min       1Q   Median       3Q      Max -2.62687 -0.75156 -0.03598  0.72383  2.44890 Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)   2.3899     0.1949   12.26  < 2e-16 ***x2            2.8996     0.6330    4.58 1.37e-05 ***---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 1.072 on 98 degrees of freedomMultiple R-squared:  0.1763,    Adjusted R-squared:  0.1679 F-statistic: 20.98 on 1 and 98 DF,  p-value: 1.366e-05

Because P-values close to 0 can be rejected h*0: β*1 = 0 hypothesis
F
Because X1 and X2 are collinear, it is difficult to distinguish their effects when X1 and X2 do linear regression, and when they do linear regression they are clear.
G

> X1=c (x1,0.1) > X1=c (x1,0.1) > X2=c (x2,0.8) > Y=c (y,6) > lm.fit1 = LM (Y~X1+X2) > Summary (lm.fit1) CALL:LM (Formula = y ~ x1 + x2)            Residuals:min 1Q Median 3Q max-2.73348-0.69318-0.05263 0.66385 2.30619 coefficients:    Estimate Std. Error t value Pr (>|t|) (Intercept) 2.2267 0.2314 9.624 7.91e-16 ***x1 0.5394 0.5922 0.911 0.36458 x2 2.51 0.8977 2.801 0.00614 * *---signif. codes:0 ' * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 ' 1Residual standard error:1.075 on 98 Degrees of Freedommultiple r-squared:0.2188, adjusted r-squared:0.202 9 f-statistic:13.72 on 2 and 98 DF, p-value:5.564e-06> lm.fit2 = LM (y~x1) > Summary (LM.FIT2) call:lm (formula = y ~ X1) residuals:min 1Q Median 3Q max-2.8897-0.6556-0.0909 0.5682 3.5665 coefficients:esti    Mate Std. Error T value Pr (>|t|)  (Intercept) 2.2569 0.2390 9.445 1.78e-15 ***x1 1.7657   0.4124 4.282 4.29e-05 * * *---signif. codes:0 ' * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 ' 1Residual standard error:1.111 on degrees of Freedommultiple r-squared:0.1562, adjusted r-squared:0.147 7 f-statistic:18.33 on 1 and p-value:4.295e-05> DF, lm.fit3 = LM (Y~X2) > Summary (LM.FIT3) call:lm (formula = y ~            X2) residuals:min 1Q Median 3Q max-2.64729-0.71021-0.06899 0.72699 2.38074 coefficients:    Estimate Std. Error t value Pr (>|t|) (Intercept) 2.3451 0.1912 12.264 < 2e-16 ***x2 3.1190 0.6040 5.164 1.25e-06 * * *---signif. codes:0 ' * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 ' 1Residual standard error:1.074 on degrees of Freedommultiple r-squared:0.2122, adjusted r-squared:0.204 2 f-statistic:26.66 on 1 and DF, p-value:1.253e-06

The new data causes the β1=0 hypothesis to not be rejected in Y1.

> par(mfrow=c(2,2))> plot(lm.fit1)

> par(mfrow=c(2,2))> plot(lm.fit2)

> par(mfrow=c(2,2))> plot(lm.fit3)


In the first and third linear regression models, the newly added points are high-weighted points.

> plot(predict(lm.fit1), rstudent(lm.fit1))> plot(predict(lm.fit2), rstudent(lm.fit2))> plot(predict(lm.fit3), rstudent(lm.fit3))


Only the second linear regression model has a normalized residual value greater than 3, which is an outlier.

ISLR Chapter III Application of Linear regression exercises answer (bottom)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.