Model evaluation and model selection of machine learning

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

One model evaluation

For these two errors, the test error can reflect the learning method to the unknown test data set prediction ability, is an important concept in learning, usually the learning method to the unknown data prediction ability is called generalization ability (generalization ability).
Two generalization capability and overfitting problem

Overfitting means that the model chosen at the time of learning contains too many parameters that a model predicts a well-known parameter, but is poorly predicted for unknown parameters. Take one-dimensional regression analysis as an example, if the high-order polynomial to fit the data, such as five data points, with four polynomial to fit, if the polynomial curve through these data points, as shown in Figure 2, then only the unique solution. This situation may make the training error very small, but the actual real error can be very large, which indicates that the learning model has a poor generalization ability (bad generalization), the ability to predict unknown data is poor.

As can be seen from the above figure, the test error has a first drop after the rising trend, if the definition of the test error curve of the lowest point is DVC for the best VC dimension, then with the horizontal axis shifted to the right, the training error decreases, the test error rises, when the test error is too large to produce an over-fitting phenomenon (over Fitting), and if you move from the best VC dimension to the left, this results in an under-fitting (under fitting).
caused over-fitting because of: the model of the VC dimension is too high, the model complexity is too high, the noise in the data, if fully fitted, may be more deviation from the real situation, the data is limited, so that the model can not really understand the real distribution of the entire data.

Three-model selection
The model selection is designed to avoid overfitting and improve the predictive ability of the model, and more typical methods are regularization and cross-validation.
1 regularization
The general form is shown below

The 1th is empirical risk, and the 2nd is the regularization term, which can be the norm of the parameter vector. To adjust the coefficients of the relationship between the two. The experience of the 1th item is less risky when the model may be more complex (with multiple non-0 parameters), then the second model is more complex, and regularization is the choice of models with less empirical risk and model complexity.
2 Cross-validation
The basic idea of cross-validation is to reuse the data, slice the given data, and divide the data into training set test sets, on the basis of which the training, testing and model selection are repeated.
(1) Simple cross-validation
Randomly dividing the data into two parts: the training set and the test set. The training set is trained under various conditions (such as different parameters) to obtain different training models. The test set is tested by each model, and the model with the smallest test error is selected.
(2) S-fold cross-validation (S-fold crosses validation)
Randomly divides the data into S distinct subsets, then uses the data training model of the S-1 subset to test the model with the remaining subset. This process is repeated s-times, and finally the average test error in the S-sub-evaluation model is selected.
(3) Leave a cross-validation
Leave a cross-validation as a special case of S-fold cross-validation, i.e. S=n (N is sample capacity), often used when data is scarce.

Reference:
1 author Jason Ding http://www.open-open.com/lib/view/open1423572428467.html

2 Hangyuan Li "Statistical Learning method"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Model evaluation and model selection of machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Model evaluation and model selection of machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support