Share some of the less-fitting and over-fitting in linear regression.
In order to solve the situation of under-fitting, it is often necessary to improve the linear number of times to set up a model fitting curve, too many times will lead to overfitting, the number of times will not fit.
When the higher function is established, the training data can be generated by using the polynomial feature generator.
Let's show you the whole process.
Simulates a process of predicting the price of a cake from an under-fitting to an over-fitting
Git:https://github.com/linyi0604/machinelearning
In order to improve the generalization ability of the model, multiple linear functions are often used to establish the model in linear regression prediction.
f = k*x + b once function
F = a*x^2 + b*x + w two times function
F = a*x^3 + b*x^2 + c*x + w three times function
。。。
Generalization:
Predict the sample of data that has not been trained.
Under-fit:
Due to insufficient fitting degree of training samples, the generalization ability of the model is insufficient.
Over fitting:
The training sample fitting is very good, and has learned the characteristic which does not want to learn, causes the model generalization ability to be insufficient.
To create a polynomial feature for the default feature and then input it to the model before establishing a linear regression model with more than one function
Poly2 = Polynomialfeatures (degree=2) # 2-time polynomial feature generator x_train_poly2 = poly2.fit_transform (x_train)
The following simulation predicts the cake price according to the size of the cake
1 fromSklearn.linear_modelImportlinearregression2 ImportNumPy as NP3 ImportMatplotlib.pyplot as Plt4 5 " "6 when making linear regression predictions,7 to improve the generalization ability of the model, multiple linear functions are often used to establish the model.8 9 F = k*x + b once functionTen f = a*x^2 + b*x + W two times function One f = a*x^3 + b*x^2 + c*x + W three times function A ..... - - Generalization: the predict the sample of data that has not been trained. - - Under-fit: - due to insufficient fitting degree of training samples, the generalization ability of the model is insufficient. + - over fitting: + The training sample fitting is very good, and has learned the characteristic which does not want to learn, causes the model generalization ability to be insufficient. A at - to create a polynomial feature for the default feature and then input it to the model before establishing a linear regression model with more than one function - - The following simulation predicts the cake price according to the size of the cake - - " " in - #training data, characteristics and target values for the sample toX_train = [[6], [8], [10], [14], [18]] +Y_train = [[7], [9], [13], [17.5], [18]] - the #study and prediction of a linear regression * #linear regression Model learning $Regressor =linearregression ()Panax Notoginseng Regressor.fit (X_train, Y_train) - #draw a fitting curve for linear regression thexx = Np.linspace (0, 25, 100)#0 to 16 evenly collect 100 points to do X-axis +xx = Xx.reshape (xx.shape[0], 1) Ayy = Regressor.predict (xx)#calculates the y corresponding to each point thePlt.scatter (X_train, Y_train)#draw the points of the training data +PLT1, = Plt.plot (xx, yy, label="degree=1") -Plt.axis ([0, 25, 0, 25]) $Plt.xlabel ("Diameter") $Plt.ylabel (" Price") -Plt.legend (handles=[PLT1]) -Plt.show ()
The result of a linear function fitting curve is a condition that is less than fit:
The following 2 linear regression models are built to predict:
1 #2-Time linear regression for prediction2Poly2 = Polynomialfeatures (degree=2)#2-time polynomial feature generator3X_train_poly2 =poly2.fit_transform (X_train)4 #Building Model Predictions5Regressor_poly2 =linearregression ()6 Regressor_poly2.fit (X_train_poly2, Y_train)7 #draw a graph of 2 linear regression8Xx_poly2 =poly2.transform (XX)9Yy_poly2 =regressor_poly2.predict (xx_poly2)Ten Plt.scatter (X_train, Y_train) OnePLT1, = Plt.plot (xx, yy, label="Degree1") APLT2, = Plt.plot (xx, Yy_poly2, label="Degree2") -Plt.axis ([0, 25, 0, 25]) -Plt.xlabel ("Diameter") thePlt.ylabel (" Price") -Plt.legend (handles=[Plt1, PLT2]) - plt.show () - #predictive sample scoring for output two regression models + Print("two-time linear model prediction score:", Regressor_poly2.score (X_train_poly2, Y_train))#0.9816421639597427
Two-time linear regression model fitted curves:
The fitting degree is better than 1 linear fitting.
The following 4 linear regression models are performed:
1 #four-time linear regression model fitting2Poly4 = Polynomialfeatures (degree=4)#4-time polynomial feature generator3X_train_poly4 =poly4.fit_transform (X_train)4 #Building Model Predictions5Regressor_poly4 =linearregression ()6 Regressor_poly4.fit (X_train_poly4, Y_train)7 #draw a graph of 2 linear regression8Xx_poly4 =poly4.transform (XX)9Yy_poly4 =regressor_poly4.predict (XX_POLY4)Ten Plt.scatter (X_train, Y_train) OnePLT1, = Plt.plot (xx, yy, label="Degree1") APLT2, = Plt.plot (xx, Yy_poly2, label="Degree2") -PLT4, = Plt.plot (xx, Yy_poly4, label="Degree2") -Plt.axis ([0, 25, 0, 25]) thePlt.xlabel ("Diameter") -Plt.ylabel (" Price") -Plt.legend (handles=[Plt1, PLT2, PLT4]) - plt.show () + #predictive sample scoring for output two regression models - Print("four-time linear model prediction score:", Regressor_poly4.score (X_train_poly4, Y_train))#1.0
The accuracy of four-time linear model is 100%, but look at the fitting curve, there is obviously illogical prediction curve,
Outside the sample point, it may be very inaccurate to predict that the case is over-fitted
The path of machine learning: Python polynomial feature generation polynomialfeatures and over-fitting