機器學習之路：python 多項式特徵產生PolynomialFeatures 欠擬合與過擬合

最後更新：2018-05-01 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：a* 計算 com att 存在 line div bubuko 邏輯

分享一下 線性迴歸中 欠擬合 和 過擬合 是怎麼回事~
為瞭解決欠擬合的情 經常要提高線性次數建立模型擬合曲線， 次數過高會導致過擬合，次數不夠會欠擬合。
再建立高次函數時候，要利用多項式特徵產生器 產生訓練資料。
下面把整個流程展示一下
類比了一個預測蛋糕價格的從欠擬合到過擬合的過程

git: https://github.com/linyi0604/MachineLearning

在做線性迴歸預測時候，為了提高模型的泛化能力，經常採用多次線性函數建立模型

f = k*x + b   一次函數
f = a*x^2 + b*x + w  二次函數
f = a*x^3 + b*x^2 + c*x + w  三次函數
。。。

泛化：
    對未訓練過的資料樣本進行預測。
    
欠擬合:
    由於對訓練樣本的擬合程度不夠，導致模型的泛化能力不足。

過擬合：
    訓練樣本擬合非常好，並且學習到了不希望學習到的特徵，導致模型的泛化能力不足。
    

在建立超過一次函數的線性迴歸模型之前，要對預設特徵產生多項式特徵再輸入給模型

　　poly2 = PolynomialFeatures(degree=2)    # 2次多項式特徵產生器　　x_train_poly2 = poly2.fit_transform(x_train)



下面類比 根據蛋糕的直徑大小 預測蛋糕價格

 1 from sklearn.linear_model import LinearRegression 2 import numpy as np 3 import matplotlib.pyplot as plt 4  5 ‘‘‘ 6 在做線性迴歸預測時候， 7 為了提高模型的泛化能力，經常採用多次線性函數建立模型 8  9 f = k*x + b   一次函數10 f = a*x^2 + b*x + w  二次函數11 f = a*x^3 + b*x^2 + c*x + w  三次函數12 。。。13 14 泛化：15     對未訓練過的資料樣本進行預測。16     17 欠擬合:18     由於對訓練樣本的擬合程度不夠，導致模型的泛化能力不足。19 20 過擬合：21     訓練樣本擬合非常好，並且學習到了不希望學習到的特徵，導致模型的泛化能力不足。22     23 24 在建立超過一次函數的線性迴歸模型之前，要對預設特徵產生多項式特徵再輸入給模型25 26 下面類比 根據蛋糕的直徑大小 預測蛋糕價格27  28 ‘‘‘29 30 # 樣本的訓練資料，特徵和目標值31 x_train = [[6], [8], [10], [14], [18]]32 y_train = [[7], [9], [13], [17.5], [18]]33 34 # 一次線性迴歸的學習與預測35 # 線性迴歸模型 學習36 regressor = LinearRegression()37 regressor.fit(x_train, y_train)38 # 畫出一次線性迴歸的擬合曲線39 xx = np.linspace(0, 25, 100)   # 0到16均勻採集100個點做x軸40 xx = xx.reshape(xx.shape[0], 1)41 yy = regressor.predict(xx)  # 計算每個點對應的y42 plt.scatter(x_train, y_train)   # 畫出訓練資料的點43 plt1, = plt.plot(xx, yy, label="degree=1")44 plt.axis([0, 25, 0, 25])45 plt.xlabel("Diameter")46 plt.ylabel("Price")47 plt.legend(handles=[plt1])48 plt.show()

一次線性函數擬合曲線的結果，是欠擬合的情況：

下面進行建立2次線性迴歸模型進行預測：

 1 # 2次線性迴歸進行預測 2 poly2 = PolynomialFeatures(degree=2)    # 2次多項式特徵產生器 3 x_train_poly2 = poly2.fit_transform(x_train) 4 # 建立模型預測 5 regressor_poly2 = LinearRegression() 6 regressor_poly2.fit(x_train_poly2, y_train) 7 # 畫出2次線性迴歸的圖 8 xx_poly2 = poly2.transform(xx) 9 yy_poly2 = regressor_poly2.predict(xx_poly2)10 plt.scatter(x_train, y_train)11 plt1, = plt.plot(xx, yy, label="Degree1")12 plt2, = plt.plot(xx, yy_poly2, label="Degree2")13 plt.axis([0, 25, 0, 25])14 plt.xlabel("Diameter")15 plt.ylabel("Price")16 plt.legend(handles=[plt1, plt2])17 plt.show()18 # 輸出二次迴歸模型的預測樣本評分19 print("二次線性模型預測得分:", regressor_poly2.score(x_train_poly2, y_train))     # 0.9816421639597427

二次線性迴歸模型擬合的曲線：

擬合程度明顯比1次線性擬合的要好

下面進行4次線性迴歸模型：

 1 # 進行四次線性迴歸模型擬合 2 poly4 = PolynomialFeatures(degree=4)    # 4次多項式特徵產生器 3 x_train_poly4 = poly4.fit_transform(x_train) 4 # 建立模型預測 5 regressor_poly4 = LinearRegression() 6 regressor_poly4.fit(x_train_poly4, y_train) 7 # 畫出2次線性迴歸的圖 8 xx_poly4 = poly4.transform(xx) 9 yy_poly4 = regressor_poly4.predict(xx_poly4)10 plt.scatter(x_train, y_train)11 plt1, = plt.plot(xx, yy, label="Degree1")12 plt2, = plt.plot(xx, yy_poly2, label="Degree2")13 plt4, = plt.plot(xx, yy_poly4, label="Degree2")14 plt.axis([0, 25, 0, 25])15 plt.xlabel("Diameter")16 plt.ylabel("Price")17 plt.legend(handles=[plt1, plt2, plt4])18 plt.show()19 # 輸出二次迴歸模型的預測樣本評分20 print("四次線性模型預測得分:", regressor_poly4.score(x_train_poly4, y_train))     # 1.0

四次線性模型預測準確率為百分之百，但是看一下擬合曲線，明顯存在不合邏輯的預測曲線，

在樣本點之外的情況，可能預測的非常不準確，這種情況為過擬合

機器學習之路：python 多項式特徵產生PolynomialFeatures 欠擬合與過擬合

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More