Prediction problems in machine learning are usually divided into 2 categories: regression and classification .
Simply put, regression is a predictive value, and classification is a label that classifies data.
This article describes how to use Python for basic data fitting, and how to analyze the error of fitting results.
This example uses a 2-time function with a random perturbation to generate 500 points, and then attempts to fit the data using a polynomial of 1, 2, 100.
The purpose of fitting is to fit a polynomial function according to the training data, which can fit the existing data well and predict the unknown data.
The code is as follows:
[Python]View PlainCopy
- Import Matplotlib.pyplot as Plt
- Import NumPy as NP
- Import scipy as SP
- From Scipy.stats import Norm
- From Sklearn.pipeline Import pipeline
- From Sklearn.linear_model import linearregression
- From sklearn.preprocessing import polynomialfeatures
- From Sklearn import Linear_model
- "'data generation '
- x = Np.arange (0, 1, 0.002)
- y = Norm.rvs (0, size=, scale=0.1)
- y = y + x**2
- "'mean square error root '
- def rmse (Y_test, y):
- return Sp.sqrt (Sp.mean ((y_test-y) * * 2))
- "' issuperior to the mean, between [0~1]. 0 is not equal to the mean value. 1 means perfect predictions. This version of the implementation is the reference Scikit-learn official website document "'
- def R2 (Y_test, y_true):
- return 1-((y_test-y_true) * *2). SUM ()/((Y_true-y_true.mean ()) * *2). SUM ()
- "This is the version of Conway&white" Machine learning use case resolution "
- def R22 (Y_test, y_true):
- Y_mean = Np.array (y_true)
- y_mean[:] = Y_mean.mean ()
- return 1-rmse (y_test, y_true)/Rmse (Y_mean, y_true)
- Plt.scatter (x, y, s=5)
- degree = [1,2, +]
- Y_test = []
- Y_test = Np.array (y_test)
- For D in degree:
- CLF = Pipeline ([' Poly ', Polynomialfeatures (Degree=d)),
- (' linear ', linearregression (fit_intercept=False))])
- Clf.fit (x[:, Np.newaxis], y)
- Y_test = Clf.predict (x[:, Np.newaxis])
- print (clf.named_steps[' linear '].coef_)
- print (' rmse=%.2f, r2=%.2f, r22=%.2f, clf.score=%.2f '%
- (Rmse (y_test, y),
- R2 (Y_test, y),
- R22 (Y_test, y),
- Clf.score (x[:, Np.newaxis], y)))
- Plt.plot (x, Y_test, linewidth=2)
- Plt.grid ()
- Plt.legend ([' 1 ',' 2 ',' + '], loc=' upper left ')
- Plt.show ()
The program runs with the following display results:
[-0.16140183 0.99268453]
rmse=0.13, r2=0.82, r22=0.58, clf.score=0.82
[0.00934527-0.03591245 1.03065829]
rmse=0.11, r2=0.88, r22=0.66, clf.score=0.88
[6.07130354e-02-1.02247150e+00 6.66972089e+01-1.85696012e+04
......
-9.43408707e+12-9.78954604e+12-9.99872105e+12-1.00742526e+13
-1.00303296e+13-9.88198843e+12-9.64452002e+12-9.33298267e+12
-1.00580760E+12]
rmse=0.10, r2=0.89, r22=0.67, clf.score=0.89
The coef_ shown is the polynomial parameter. If the result of fitting 1 times is
y = 0.99268453x-0.16140183
Here are some points to note:
1, error analysis .
To do regression analysis, the common errors are mainly mean square error root (RMSE) and R-Squared (R2).
RMSE is the mean value of the square root of the error of the predicted value and the true value. This method of measurement is popular (the Netflix machine Learning Competition evaluation method), is a quantitative tradeoff.
The R2 method is to see how much better the predicted value is compared to using the mean only. The interval is usually between (0,1). 0 means that it is not as predictable as to take the mean directly, while 1 means that all predictions match the real results perfectly.
The calculation method of R2, different literature is slightly different. As the function R2 in this paper is based on the Scikit-learn official website document, with the Clf.score function results consistent.
The implementation of the R22 function comes from Conway's book, "Machine learning use case resolution", except that he uses a ratio of 2 rmse to calculate R2.
When we see that the polynomial number is 1, the R2 can reach 0.82, although the fitting is not very good. The 2-time polynomial was raised to 0.88. and the number increased to 100 times, R2 also only increased to 0.89.
2, over-fitting .
Using the 100-square polynomial to fit the effect is indeed a bit higher, but the model's ability to measure is extremely poor.
and pay attention to the polynomial coefficients, there are a large number of large values, even to 10 of the 12-square.
Here we modify the code to remove the last 2 of the 500 samples from the training set. However, all 500 samples were still tested in the test.
Clf.fit (x[:498, Np.newaxis], y[:498])
The result of this modified polynomial fitting is as follows:
[-0.17933531 1.0052037]
rmse=0.12, r2=0.85, r22=0.61, clf.score=0.85
[-0.01631935 0.01922011 0.99193521]
rmse=0.10, r2=0.90, r22=0.69, clf.score=0.90
...
rmse=0.21, r2=0.57, r22=0.34, clf.score=0. $
Just missing the last 2 training samples, the Red line (100 polynomial fitting results) of the prediction has undergone a drastic deviation, R2 also dropped sharply to 0.57.
In contrast, the R2 of 1 and 2 polynomial have risen slightly.
This shows that the high-level polynomial over-fitting the training data, including a large number of noise, resulting in a complete loss of the data trend prediction ability. As we can see earlier, the coefficients of the 100-time polynomial fitting are extremely large. It is natural to think that by limiting the size of these coefficients in the fitting process, this deformed fitting function can be avoided.
The basic principle is to add the sum of the absolute values of all coefficients of the fitted polynomial (L1 regularization) or the sum of squares (L2 regularization) to the penalty model and specify a penalty force factor W to avoid this deformity factor.
This kind of thought applies in the ridge (Ridge) return (uses L2 regularization), the Lasso method (uses the L1 regularization), the elastic net (Elastic net, uses the L1+L2 regularization) and so on, can effectively avoid the overfitting. More principles can refer to relevant information.
The following is an example of ridge regression to see if the 100-time polynomial fitting is valid. Modify the code as follows:
CLF = Pipeline ([' Poly ', Polynomialfeatures (Degree=d)),
(' linear ', Linear_model. Ridge ())])
Clf.fit (x[:[Np.newaxis], y[:])
The results are as follows:
[0.0.75873781]
rmse=0.15, r2=0.78, r22=0.53, clf.score=0.78
[0.0.35936882 0.52392172]
rmse=0.11, r2=0.87, r22=0.64, clf.score=0.87
[0.00000000e+00 2.63903249e-01 3.14973328e-01 2.43389461e-01
1.67075328e-01 1.10674280e-01 7.30672237e-02 4.88605804e-02
......
3.70018540e-11 2.93631291e-11 2.32992690e-11 1.84860002e-11
1.46657377E-11]
rmse=0.10, r2=0.90, r22=0.68, clf.score=0.90
As you can see, the coefficient parameters of the 100-time polynomial become very small. Most are close to 0.
It is also worth noting that the R2 values of 1 and 2 polynomial regressions may be slightly lower than the basic linear regression after using a penalty model such as Ridge regression.
However, such a model, even using 100-time polynomial, in the training of 400 samples, the prediction of 500 samples of the case not only has a smaller R2 error, but also has excellent predictive power.
Start machine learning with Python (3: Data fitting and generalized linear regression)