2.1 Experience Error and overfitting
Basic concepts:
Error Rate: Number of classification errors/total number of samples
Training error/Experience Error: The error generated by the learner in the training set
Generalization error: The error generated by the learner on the test set
2.2 Evaluation methods
In the actual application there will be a variety of different algorithms to choose, for different problems, we choose which learning algorithm and parameter configuration, machine learning model selection. The generalization error cannot be obtained directly, the training error is not suitable for the existence of the overfitting phenomenon, and how we evaluate and select the model.
The test set is randomly obtained from the training set, and the test set and the training set are mutually exclusive. A few common practices are described below, through the proper processing of D, from which training sets S and test set T are generated.
2.1 Set aside method (2/3~4/5)
Note: Avoid introducing additional deviations due to the data partitioning process two resulting in impact on the results
Method: Stratified sampling (sampled separately for different categories)
A number of random repeat divisions are evaluated and averaged.
2.2 Cross-validation method (10 times 10 percent)
Method: Divide the DataSet into K-sized mutually exclusive subsets, then use k-1 as the training set, leaving one as the test set
Note: 10 random resampling
2.3 Self-help method
Method: There is a size of the number of samples to be put back
Note: The data generated by the self-help method changes the distribution of the initial data set, which introduces the estimation bias, so the method of retention and cross-validation is more common when the initial data volume is sufficient.
2.3 Performance Metrics
The ability to measure the generalization of a model is a performance metric. When comparing the capabilities of different models, using different performance metrics often results in different judgments. Good or bad is relative, not only depends on the algorithm and data but also depends on the task demand.
Error Rate: The sample of the classification error takes up the overall sample.
Recall (Recall): Judging positive cases, accounting for all positive cases
precision Ratio (precision): Judging as a positive case, judging the correct proportion
p-r Curve (area):
According to the study of the prediction results of the order of the sample, ranked in the front of the learner is "most likely" as a positive example sample, ranked in the back is the learner think "the most unlikely", in order as a positive example to predict, calculate precision and recall, and as the horizontal and vertical axis, showing the corresponding p-r map.
The balance points (break-even point, BEP) are compared to obtain the intersection of the line y=x and the P-r curve.
F1: Harmonic averaging based on precision and recall:
FB: A measure of different preferences for recall rate and precision ratio:
For the above assessment method (retention method, cross-validation method), I can get multiple confusion matrices. Usually there are two approaches, one is "macro F1" to solve each confusion matrix after the overall averaging. The other is "micro-F1" first to the confusion matrix and then calculate the value of F1.
ROC and AUC
ROC (Receiver Operating characteristic) "subjects ' working characteristics"
Model evaluation and selection of machine learning