ROC (receiver operating characteristic curve)
The working characteristic curve of the subjects, the following figure
This diagram introduces the false positive rate, and the true rate. (feeling in reading a dream of Red mansions)
Tpr:true Positive Rate (true rate, TPR) or sensitivity (sensitivity)(is the recall of the previous R)
TPR = TP/(TP + FN) Positive sample forecast number/Positive sample actual number
Tnr:true Negative Rate (true negative rate, TNR) or specificity (specificity)
TNR = TN/(TN + FP) Negative sample predicted number/Negative sample actual number
Fpr:false Positive Rate (false positive rate, FPR)
FPR = FP/(fp + TN) Negative sample result/Negative sample actual number predicted to be positive
Fnr:false Negative Rate (false negative rate, FNR)
FNR = FN/(TP + fn) Positive sample result number/Positive sample actual number predicted to be negative
Ideal goal: tpr=1,fpr=0, that is, in the graph (0,1) point, at this time fpr=0,tpr=1, both positive samples are divided into the positive class, negative samples are divided into negative categories. Conversely, (1,0), that is, fpr=1,tpr=0, a similar analysis can find that this is one of the worst classifiers, because it is the opposite of all the correct answers.
In other words, theROC chart is more left to the upper corner, the better the model effect. AUC
AUC (Area Under Curve) is the size of the ROC's go-to curve, integrating both the ROC curve. The larger the area, the better the model is considered. AUC and AP (average precision) are the same thing, AP often appears in the image processing. PRC (Precision recall curve)
Generally speaking, the above is better than below (the Green line is better than red). In other words, the more the curve to the right corner, the better the effect.
Summarize:
1 ROC,PRC,AUC These three quantities will be examined simultaneously. Among them, the ROC's TPR is the recall of PRC, this link will bring other advantages, see below.
2 for both curves, the smoother the positive model is, the better the threshold setting is in essence.
Now the question is, what is the best curve for Roc and PRC? It is obviously PRC, because PRC is the People's Republic of China.
To give a direct conclusion:
When the positive and negative sample gap is not large, the ROC and the PR trend is similar, but when a lot of negative samples, the two are very different, the ROC effect still seems very good, but the PR reflects the general effect. It is also simple to explain, assuming that there are 1 positive cases and 100 negative cases, then basically TPR may have remained at around 100, and then suddenly dropped to 0. The ROC curve and the PR curve in the case of (a) (b) 1:1 positive and negative samples, respectively, are relatively close. and (c) (d) Positive and negative sample ratio of 1:1, when the ROC curve effect is still very good, but the PR curve is relatively poor performance. This indicates that the PR curve can reflect the performance of classification better when the ratio of positive and negative samples is large.