統計學習導論:基於R應用——第四章習題

來源:互聯網
上載者:User

標籤:

第四章習題,部分題目未給出答案

 

1.

這個題比較簡單,有高中生推導水平的應該不難。

 

2~3證明題,略

 

4.

(a)

這個問題問我略困惑,答案怎麼直接寫出來了,難道不是10%麼

(b)

這個答案是(0.1*0.1)/(1*1),所以答案是1%

(c)

其實就是個空間所佔比例,所以這題是(0.1**100)*100 = 0.1**98%

(d)

這題答案顯而易見啊,而且是指數層級下降

(e)

答案是0.1**(1)、0.1**(1/2)、0.1**(1/3)...0.1**(1/100)

 

5.

這題在中文版的104頁的偏差-方差權衡說的聽清楚。

(a)

當貝葉斯決策邊界是線性時候,訓練集上當然是QDA效果好,因為擬合的更好。而測試集上是LDA更好,因為更接近實際。

(b)

當貝葉斯決策邊界是非線性時候,QDA在訓練集和測試集都比LDA好

(c)

相比於LDA,QDA的預測率變得更好。因為當樣本量n提升時,一個自由度更高的模型會產生更好的效果,因為方差會被大的樣本抵消一點

(d)

不對。因為當樣本很少時,QDA會過擬合。

 

6.

(a)

由公式直接帶入p(X)=37.75%

(b)

還是帶入上述公式,反求X1為50hours

 

7.

其實就是貝葉斯公式+中文版書97頁公式4-12。。。有點繁瑣,最後答案是75.2%

 

8.

文字題。。當你用K=1的KNN時,在訓練集上的錯誤率是0%,所以測試集上錯誤率實際是36%。我們當然選羅吉斯迴歸啦

 

9.

參見92頁公式4-3。。。就是帶入公式而已,第一題是27%,第二題是0.19

 

10.

(a)

感覺題目裡面讓我們進行數值和映像描述統計時,大概就三條命令:summary()、pairs()、cor()。不過pairs()在特徵很多的時候,跑的真心慢,cor()在使用前也要把定性的變數去掉。

library(ISLR)summary(Weekly)pairs(Weekly)cor(Weekly[, -9])

(b)

attach(Weekly)glm.fit = glm(Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume, data = Weekly,  family = binomial)summary(glm.fit)

(c)

glm.probs = predict(glm.fit, type = "response")glm.pred = rep("Down", length(glm.probs))glm.pred[glm.probs > 0.5] = "Up"table(glm.pred, Direction)

(d)

train = (Year < 2009)Weekly.0910 = Weekly[!train, ]glm.fit = glm(Direction ~ Lag2, data = Weekly, family = binomial, subset = train)glm.probs = predict(glm.fit, Weekly.0910, type = "response")glm.pred = rep("Down", length(glm.probs))glm.pred[glm.probs > 0.5] = "Up"Direction.0910 = Direction[!train]table(glm.pred, Direction.0910)
mean(glm.pred == Direction.0910)

(e)

library(MASS)lda.fit = lda(Direction ~ Lag2, data = Weekly, subset = train)lda.pred = predict(lda.fit, Weekly.0910)table(lda.pred$class, Direction.0910)mean(lda.pred$class == Direction.0910)

(f)

qda.fit = qda(Direction ~ Lag2, data = Weekly, subset = train)qda.class = predict(qda.fit, Weekly.0910)$classtable(qda.class, Direction.0910)mean(qda.class == Direction.0910)

(g)

library(class)train.X = as.matrix(Lag2[train])test.X = as.matrix(Lag2[!train])train.Direction = Direction[train]set.seed(1)knn.pred = knn(train.X, test.X, train.Direction, k = 1)table(knn.pred, Direction.0910)mean(knn.pred == Direction.0910)

(h)

兩種方法的準確率一樣。。。

(i)

# Logistic regression with Lag2:Lag1glm.fit = glm(Direction ~ Lag2:Lag1, data = Weekly, family = binomial, subset = train)glm.probs = predict(glm.fit, Weekly.0910, type = "response")glm.pred = rep("Down", length(glm.probs))glm.pred[glm.probs > 0.5] = "Up"Direction.0910 = Direction[!train]table(glm.pred, Direction.0910)mean(glm.pred == Direction.0910)## [1] 0.5865# LDA with Lag2 interaction with Lag1lda.fit = lda(Direction ~ Lag2:Lag1, data = Weekly, subset = train)lda.pred = predict(lda.fit, Weekly.0910)mean(lda.pred$class == Direction.0910)## [1] 0.5769# QDA with sqrt(abs(Lag2))qda.fit = qda(Direction ~ Lag2 + sqrt(abs(Lag2)), data = Weekly, subset = train)qda.class = predict(qda.fit, Weekly.0910)$classtable(qda.class, Direction.0910)mean(qda.class == Direction.0910)## [1] 0.5769# KNN k =10knn.pred = knn(train.X, test.X, train.Direction, k = 10)table(knn.pred, Direction.0910)mean(knn.pred == Direction.0910)## [1] 0.5769# KNN k = 100knn.pred = knn(train.X, test.X, train.Direction, k = 100)table(knn.pred, Direction.0910)mean(knn.pred == Direction.0910)## [1] 0.5577

結果在代碼注釋中,羅吉斯迴歸效果最好

 

11.

(a)

library(ISLR)summary(Auto)attach(Auto)mpg01 = rep(0, length(mpg))mpg01[mpg > median(mpg)] = 1Auto = data.frame(Auto, mpg01)

(b)

cor(Auto[, -9])pairs(Auto)

(c)

train = (year%%2 == 0)  # if the year is eventest = !trainAuto.train = Auto[train, ]Auto.test = Auto[test, ]mpg01.test = mpg01[test]

(d)

library(MASS)lda.fit = lda(mpg01 ~ cylinders + weight + displacement + horsepower, data = Auto, subset = train)lda.pred = predict(lda.fit, Auto.test)mean(lda.pred$class != mpg01.test)

(e)

qda.fit = qda(mpg01 ~ cylinders + weight + displacement + horsepower, data = Auto, subset = train)qda.pred = predict(qda.fit, Auto.test)mean(qda.pred$class != mpg01.test)

(f)

glm.fit = glm(mpg01 ~ cylinders + weight + displacement + horsepower, data = Auto, family = binomial, subset = train)glm.probs = predict(glm.fit, Auto.test, type = "response")glm.pred = rep(0, length(glm.probs))glm.pred[glm.probs > 0.5] = 1mean(glm.pred != mpg01.test)

(g)

 

library(class)train.X = cbind(cylinders, weight, displacement, horsepower)[train, ]test.X = cbind(cylinders, weight, displacement, horsepower)[test, ]train.mpg01 = mpg01[train]set.seed(1)# KNN(k=1)knn.pred = knn(train.X, test.X, train.mpg01, k = 1)mean(knn.pred != mpg01.test)# KNN(k=10)knn.pred = knn(train.X, test.X, train.mpg01, k = 10)mean(knn.pred != mpg01.test)# KNN(k=100)knn.pred = knn(train.X, test.X, train.mpg01, k = 100)mean(knn.pred != mpg01.test)

13題和11題類似,就是用這幾個函數。所以13題略。

12.

(a)~(b)

Power = function() {  2^3}print(Power())Power2 = function(x, a) {  x^a}Power2(3, 8)

(c)

Power2(10, 3)Power2(8, 17)Power2(131, 3)

(d)~(f)

Power3 = function(x, a) {  result = x^a  return(result)}x = 1:10plot(x, Power3(x, 2), log = "xy", ylab = "Log of y = x^2", xlab = "Log of x",      main = "Log of x^2 versus Log of x")PlotPower = function(x, a) {  plot(x, Power3(x, a))}PlotPower(1:10, 3)

 

統計學習導論:基於R應用——第四章習題

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.