The generalization performance of the learner is evaluated.
In the measure, we should have the evaluation criterion to measure the generalization ability of the model.
Performance metrics reflect task requirements, and using different performance metrics often leads to different judgments when comparing the capabilities of different models, which means that the model is relatively good or bad, and that the model is not only dependent on the algorithm and data, but also on the task requirements. I. Error rate and accuracy error rate is the proportion of samples with the wrong number of samples, and the progress is the proportion of the correct number of samples to the total number of samples. two. Precision ratio, recall and F1 check rate is high is to represent as accurate as possible to jump out, recall refers to as far as possible to pick out all the right, that is, rather wrong to kill and not let go. For the two classification problem, the sample can be divided into real example, false positive example according to the combination of real category and learning period prediction category, True counter example, false inverse example, set to TP\FP\TN\FN, obviously have tp+fp+tn+fn= sample total. The confusion matrix for the classification results is as follows
Real situation |
Forecast results |
|
|
Positive example |
Counter Example |
Positive example |
Tp |
Fn |
Counter Example |
Fp |
Tn |
The precision ratio P and recall R respectively are defined as p=tp/(TP+FP) r=tp/(TP+FN) precision and recall is a set of contradictory measures, this high and low. Take algorithm rate as ordinate, check full rate for horizontal axis, can draw p-r diagram. P-r characteristics: If a learner's curve is fully wrapped by another learner's curve, it means that another semester is better than the learner's performance. If there is a cross, the general calculation area of large performance is better. The equilibrium point is the point when the curve recall is equal to the precision ratio, and it can be judged by this point. f1= (2*p*r)/(P+R) =2*tp/(sample Total +tp-tn) is also a discriminant method. In complex cases, it is also important to consider the importance of recall or precision, as well as the method of averaging under multiple confusion matrices.
three. Roc and AUCMany learners actually produce a threshold value for the test sample, which is higher than the positive class and vice versa. It can also be called a truncation point. In different tasks, we have to take into account the precision and recall of the importance of high and low, which we can rely on ROC. We sort the samples according to the predicted results of the learner, test each sample as a positive example in this order, and calculate the TPR and FPR as vertical axes each time. That is, the real case rate and false positive rate. tpr=tp/(TP+FN) fpr=fp/(TN+FP) is similar in character to P-r diagram. The AUC refers to the area covered by the ROC curve. The formal view of AUC considers the ordering quality of sample predictions, so it is closely related to sequencing errors.
four. Cost-sensitive error rate and cost curveIn practical applications, you must also consider the cost of judging errors, such as in the access control system, if put in a bad person, or put a good person outside the door, will affect the user experience. The usual method should be to set a weight for any error, in the calculation of the cost-sensitive error rate, you must consider this weight, can not simply directly calculate the number of errors.