1. Training error: The error of the learner in the training set, also known as "experience Error"
2. Generalization error: The error of the learner on the new sample
Obviously, our goal is to get a better learner on a new sample, which is a small generalization error.
3. Overfitting: The learner learns the training sample too well, leading to a decline in generalization performance (learning too much ...). Let me think of some people bookworm, reading dead books, rigid, not flexible and extrapolate)
Reason: Learning ability is too strong, some of the less common features also learned in
Measures: A key obstacle to machine learning is the poor solution
4. Under-fitting: Even the training set is not learned, let alone generalization (a bit glimpse, elephant meaning).
Reason: Low learning ability
Measures: More easily overcome, such as the extension branch in decision Tree, neural network learning to increase the number of training wheels.
Second, model evaluation method
Different learning algorithms + different parameter configurations = different Models
So how do you find the model with the least generalization error (which is our ideal model)?
1. Leave the method
Divide the existing data set in our hands into two mutually exclusive collections, using one of them as training set training model, using another to test the model, test the error, as the estimation of the generalization error
2. Cross-validation method
Divide the DataSet D into K mutually exclusive subsets, each time using a subset of K-1 to train the model, leaving a subset to test the model, the exception to "Leave a method"
3. Self-help method
There are several concepts to be aware of:
Validation set: The data set used to evaluate tests in model evaluation and selection
Test data: The data that has been encountered in the actual use of the learned model
Training data: Divided into training sets and validation sets
Third, performance measurement
Evaluation criteria for measuring generalization ability of models
Evaluation and selection of "Machine learning 2nd Learning Notes" model