Model Evaluation and Model Selection for Machine Learning (learning notes)

Source: Internet
Author: User

Time: 2014.06.26

Location: Base

Bytes --------------------------------------------------------------------------------------

I. Training error and test error

The purpose of machine learning or statistical learning is to make the learned model better able to predict not only known data but also unknown data. Different learning methods produce different models. When judging whether a learning method is good or bad, it is commonly used as follows:

1. Training Error 2 of the model based on the loss function. Model Test Error

Assume that the learned model is:

Then the training error is the average loss of the model on the Training Dataset:

N is the capacity of the training sample.

The test error is the average loss of the model on the test Dataset:


When the loss function is 0-1 loss, the test error is the error rate on the common test Dataset:

-------------------------------------------------------------------- (1)

I here is the indicator function, that is, when the parentheses are not equal to 1, otherwise it is 0 (indicator function: when the conditions in the brackets are true, 1 is taken, and 0 is taken when the values are false)

Correspondingly, the accuracy of the test dataset is:

-------------------------------------------------------------------- (2)

X obviously has: accuracy + error rate = 1, that is, (1) + (2) = 1

The size of the training error is meaningful for determining whether the given problem is easy to learn, but it is not very important in essence, the important thing is that the test error reflects the prediction capability of the learning method on the unknown test dataset. Given two learning methods, the method with a small test error has better prediction ability, is a more effective method. Generally, the prediction capability of learning methods on unknown data is called generalization ability.

Bytes --------------------------------------------------------------------------------------

Ii. overfitting and Model Selection

We know that there are many candidate models in the hypothesis space of the model. These models are of different complexity. Assume that there is a real model in the model space, that is, the ideal model, we hope to select and learn a model that approaches this ideal model as much as possible, which will show that the learned model has the same number of parameters as the ideal model, the parameter vectors of the learned model are similar to those of the ideal model. However, there is a problem, because we use limited training data to learn models. If we blindly pursue the ability of models to express training data, the resulting model may be more complex than the ideal model, which is called overfitting. That is to say, the model we learned contains too many parameters, which is too good for known data tables but poorly predicted for unknown data. The model selection aims to avoid merging to improve the prediction capability of the model.

For example, assume that a training dataset is given:


Given a group of Input-Output observations, the task of fitting polynomial functions is to assume that the given data is generated by the M-degree polynomial function and select the M-degree polynomial function that is most likely to generate the data. That is, in the M-degree polynomial function (corresponding to the model hypothesis space), select a function that has good prediction ability for both known and unknown data. Here we can solve this function:

Set the M polynomial:

In formula, X is a single variable input, and W is a vector composed of m + 1 parameters.

First, we can determine the complexity of the model, that is, determine the number of polynomials, And then solve the parameters (polynomial coefficients) based on the empirical risk minimization strategy under the given complexity of the model ), here we want to minimize the risk of experience, that is:


The coefficient 1/2 here is used to calculate the model and training data for convenience:


Then, evaluate the deviation of W to obtain the polynomial coefficient of fitting.

It is also easy to know that when m = 0, a constant line is displayed, and the data fitting effect is very poor. When M = 1, the curve is an oblique line. When M is very large, the fitting curve will pass through each training data point, and the training error is 0. This is the best training data, but this is too good, but there is a problem, because there is noise in the data itself and it is not ideal data, this training error is not normal, and this fitting curve is not good at predicting the unknown data, when M is about enough, the curve fitting effect on the training data may be good enough, and the model is also simple, which is a good choice.

In general, through the above analysis, we can find that as M increases, the complexity of the model also increases, and the training error decreases to 0, however, the test error decreases first and then increases with the increase of the number of polynomials. The purpose of learning is to minimize the test error of the model. Therefore, the choice of M should prevent overfitting.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.