One model evaluation
For these two errors, the test error can reflect the learning method to the unknown test data set prediction ability, is an important concept in learning, usually the learning method to the unknown data prediction ability is called generalization ability (generalization ability).
Two generalization capability and overfitting problem
Overfitting means that the model chosen at the time of learning contains too many parameters that a model predicts a well-known parameter, but is poorly predicted for unknown parameters. Take one-dimensional regression analysis as an example, if the high-order polynomial to fit the data, such as five data points, with four polynomial to fit, if the polynomial curve through these data points, as shown in Figure 2, then only the unique solution. This situation may make the training error very small, but the actual real error can be very large, which indicates that the learning model has a poor generalization ability (bad generalization), the ability to predict unknown data is poor.
As can be seen from the above figure, the test error has a first drop after the rising trend, if the definition of the test error curve of the lowest point is DVC for the best VC dimension, then with the horizontal axis shifted to the right, the training error decreases, the test error rises, when the test error is too large to produce an over-fitting phenomenon (over Fitting), and if you move from the best VC dimension to the left, this results in an under-fitting (under fitting).
caused over-fitting because of: the model of the VC dimension is too high, the model complexity is too high, the noise in the data, if fully fitted, may be more deviation from the real situation, the data is limited, so that the model can not really understand the real distribution of the entire data.
Three-model selection
The model selection is designed to avoid overfitting and improve the predictive ability of the model, and more typical methods are regularization and cross-validation.
1 regularization
The general form is shown below
The 1th is empirical risk, and the 2nd is the regularization term, which can be the norm of the parameter vector. To adjust the coefficients of the relationship between the two. The experience of the 1th item is less risky when the model may be more complex (with multiple non-0 parameters), then the second model is more complex, and regularization is the choice of models with less empirical risk and model complexity.
2 Cross-validation
The basic idea of cross-validation is to reuse the data, slice the given data, and divide the data into training set test sets, on the basis of which the training, testing and model selection are repeated.
(1) Simple cross-validation
Randomly dividing the data into two parts: the training set and the test set. The training set is trained under various conditions (such as different parameters) to obtain different training models. The test set is tested by each model, and the model with the smallest test error is selected.
(2) S-fold cross-validation (S-fold crosses validation)
Randomly divides the data into S distinct subsets, then uses the data training model of the S-1 subset to test the model with the remaining subset. This process is repeated s-times, and finally the average test error in the S-sub-evaluation model is selected.
(3) Leave a cross-validation
Leave a cross-validation as a special case of S-fold cross-validation, i.e. S=n (N is sample capacity), often used when data is scarce.
Reference:
1 author Jason Ding http://www.open-open.com/lib/view/open1423572428467.html
2 Hangyuan Li "Statistical Learning method"