People who have done the research on image recognition, machine learning or information retrieval know that the experimental part of the thesis should be compared with others ' algorithm. How can compare, broth, I say my method is good, you say your method is good, each do each is always not--no rules inadequate surrounding area. So slowly everyone formed a convention, with ROC Curve and PR curve to measure the merits and demerits of the algorithm. A detailed introduction to ROC curves and PR curves can be obtained from the reference:
- ROC analysis and the ROC convex Hull
- Tom Fawcett,an Introduction to ROC Analysis
- Jesse Davis,mark Goadrich. The relationship between Precision-recall and ROC Curves, and a PowerPoint note corresponding to this article.
There are 3 of these data enough, both applied analysis and theoretical analysis are very good.
Basic concepts
- True POSITIVES,TP: A positive sample is predicted, and the number of features that are actually positive samples
- False POSITIVES,FP: Predicted as positive sample, actual negative sample characteristic number (wrong prediction is positive sample, so called false)
- True Negatives,tn: Predicted as negative sample, actual also negative sample characteristic number
- False NEGATIVES,FN: Predicted as negative sample, actual characteristic number of positive sample (wrong prediction is negative sample, so called false)
Then go down to do the math of primary school:
- TP+FP+FN+FN: Total number of features (total number of samples)
- TP+FN: Actual number of positive samples
- FP+TN: Actual Negative sample count
- TP+FP: The total number of predicted results as positive samples
- TN+FN: Total number of negative samples for forecast results
Some windings, for making a distinction, can be remembered as follows: The sum of the same suffix (p or n) is the sum of the predicted __ positive samples/negative samples, prefixed with T and F, the actual sample total of 4 letters is completely different, including TP (positive positive) represents the actual positive sample, including FP (negative positive negative) represents the actual negative sample
ROC Curve and PR curve
True Positive Rate (TPR) and false Positive rate (FPR) respectively constitute the y-axis and x-axis of the ROC curve.
- tpr=tp/(TP+FN), the probability of being predicted correctly in the actual positive sample
- fpr=fp/(FP+TN), the probability of a positive sample being incorrectly predicted in a real negative sample
The actual learning algorithm, the prediction rate of 100% words, tpr=100% and fpr=0, so TPR larger and FPR smaller the better. Is it possible to use only one of them as a metric? Consider such a situation, if a picture 600x480 pixels, where the target (positive sample) only 100 pixels, if there is an algorithm, the target is to include all pixel 600x480, in this case the result of TPR is tpr=100%, but FPR is also close to 100%. Obviously, TPR satisfies the requirements but the result is not what we want because the FPR is too high.
Precision and recall (some Chinese translated into recall rates) respectively constitute the y-axis and x-axis of the PR curve.
- precision=tp/(TP+FP), the predicted result is how many positive samples are predicted correctly.
- recall=tp/(TP+FN), recall is very interesting, this actually =TPR, compared to precision only reference samples from the predicted total positive sample number results into the actual total positive sample number.
Similarly, precision and recall consider the algorithm to determine the good or bad. Well, it's all in the same place,
Figure: Confusion Matrix
Since Roc and PR are at the same time to consider two indicators, one I good one hello, who is good? The ROC space is drawn to see, for example, the TPR and FPR are drawn on two axes respectively, then along the diagonal direction, the closer to the upper right corner, the better the algorithm effect. (Because Roc and PR are similar, only the ROC space and ROC curves are discussed below.) )
Figure: Roc Space
A classification algorithm, to find an optimal classification effect, corresponding to a point in the ROC space. Usually the output of the classifier is score, such as SVM, neural network, like the following prediction results:
Table General classifier results are score tables
no. |
true |
hyp |
score |
1 |
p |
y |
0.99999 |
2 |
p |
y |
0.99999 |
3 |
p |
y |
0.99993 |
4 |
p |
y |
0.99986 |
5 |
p |
y |
0.99964 |
6 |
p |
y |
0.99955 |
7 |
n |
y |
0.68139 |
8 |
n |
y |
0.50961 |
9 |
n |
n |
0.48880 |
10 |
n |
n |
0.44951 |
True represents the actual sample property, Hyp represents the prediction result sample property, and the 4th column is the result of SCORE,HYP is usually set a threshold value, such as the table above is 0.5,score>0.5 as a positive sample, Less than 0.5 is a negative sample, so that only a ROC value can be calculated, for a more comprehensive evaluation of the effectiveness of the algorithm, by taking different thresholds, to obtain a plurality of ROC space values, these values are depicted in the ROC space curve, that is the ROC curve.
Figure: ROC Curve plotting
As long as we understand this basic point, detailed ROC curve drawing already has a lot of code, data 1 provides prel directly according to score plot ROC curve code, MATLAB also has, download link:
- Local:prec_rec.m
- Mathworks:prec_rec.m
With the ROC curve, more reference to the evaluation of the indicator has, in the ROC space, the algorithm to draw the ROC curve more convex to the northwest, the better, sometimes different classification algorithm ROC curve Cross, so many articles with the AUC (that is, area under The value of the area under the curve curve is the criterion for evaluating the quality of the algorithm. For the convex theory here, refer to [Information 2] at the beginning of the article.
Unlike the ROC curve left upper convex, the better the PR curve is the right upper convex effect, here is a simple comparison of the two curve bumps:
Figure: Comparison between ROC Space and PR space in the algorithm
As a measure, it is possible to select ROC or PR. However, data 3 shows that ROC and PR although have the same starting point, but do not necessarily get the same conclusion, in writing the paper can only refer to other people have to choose.
ROC and PR indicators in classification algorithm