Recall (recall) Precision (accuracy) f-measure e value sensitivity (sensitivity) specificity (specificity) misdiagnosis rate of missed diagnosis ROC AUC

Last Update:2018-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Berkeley Computer Vision Page
Performance Evaluation

Classification performance metrics for machine learning: ROC curve, AUC value, accuracy rate, recall rate

True Positives, TP: Predicted as a positive sample, actually also a positive sample of the characteristics of the number
False Positives, FP: Predicted as positive sample, actual negative sample characteristic number
True negatives, TN: Predicted as negative sample, actual also negative sample characteristic number
False negatives, FN: Predicted as negative sample, actual positive sample characteristic number

As shown below, the green half-circle is TP (True positives), the red semicircle is the FP (false positives), the gray rectangle on the left (excluding the green semicircle), is FN (False negatives). The light gray rectangle on the right (excluding the red semicircle) is TN (True negatives). This green and red circle represents a sample of what the model results consider to be positive.

Recall (recall) Precision (accuracy) f-measure e value sensitivity (sensitivity) specificity (specificity) misdiagnosis rate of missed diagnosis ROC AUC

Information retrieval, classification, identification, translation and other fields two most basic indicators are

Recall Rate (Recall rate) = The total number of related files/systems retrieved by the system, measured by the recall of the retrieval system.

Accuracy Rate (Precision rate) = The total number of retrieved files/systems retrieved by the system, measured by the precision of the retrieval system.

The diagram shows the following:

In general, precision is the number of items retrieved (such as documents, Web pages, etc.) that are accurate, and recall is how much of the exact entries have been retrieved.

Note : The accuracy and recall are mutually affected, ideally it must be done both high, but in general the accuracy is high, the recall rate is low, the recall rate is low, the accuracy is high, of course, if both are low, that is where the problem . In general, with different thresholds, the accuracy and recall rates of a set of different thresholds are calculated, as shown below:

If the search is done, it is to ensure that the recall of the situation to improve the accuracy rate ;
If the disease monitoring, anti-litter, is to protect the accuracy of the conditions, to enhance the recall .

Comprehensive Evaluation Index (F-MEASURE)

P and r indicators sometimes appear contradictory situations, which require a comprehensive consideration of them, the most common way is f-measure (also known as F-score).

F-measure is a weighted harmonic average of precision and recall

F = (a^2 + 1)/(1/p + a^2/r)
  = (a^2+1) *p*r/(A^2*p +r)

When the parameter is a=1, it is the most common F1:

F1 = 2*p*r/(p+r)

Sometimes we don't discriminate between accuracy and recall, for example, sometimes we pay more attention to accuracy. We use a parameter β to measure the relationship between the two. If β>1, the recall rate has a greater impact, and if β<1, the accuracy rate has a greater impact. Naturally, when β=1, the accuracy and recall influence is the same as in the F1 form. F1, which contains the metric parameter β, we remember as fβ, and the strict mathematical definition is as follows:

It is easy to understand that F1 synthesized the results of P and R.
F1 measure the accuracy of unbalanced data.
R2 fractions measure the accuracy of regression.
See more about the R2
Accurate measurement of machine learning model errors
Accurately measuring Model prediction Error

The e value is the weighted average of precision and recall, and b>1 indicates more emphasis on p

E =1-(b^2 + 1)/(1/p + b^2/r)
  =1-(b^2+1) *p*r/b^2*p +r)

Example: There are 1400 carp, 300 shrimp and 300 turtles in a pond. Now for the purpose of catching carp. Sprinkle a big net, caught 700 carp, 200 shrimp, 100 turtle. So, these indicators are as follows:

Accuracy = 700/(700 + 200 + 100) = 70%

Recall rate = 700/1400 = 50%

F value = 70% * 50% * 2/(70% + 50%) = 58.3%

Let's see if all the carp, shrimp and turtles in the pond are clean sweep, and how these indicators change:

Correct rate = 1400/(1400 + 300 + 300) = 70%

Recall rate = 1400/1400 = 100%

F value = 70% * 100% * 2/(70% + 100%) = 82.35%

It can be seen that the accuracy rate is the proportion of the target achievement in the results of the capture; recall, as the name implies, is the proportion of the target category recalled from the area of concern, while the F-value is the evaluation index of the two indicators, which is used to comprehensively reflect the overall indicators.

Of course, it is hoped that the higher the result precision the better, and the higher the recall, the better, but in fact the two are contradictory in some cases. For example, in extreme cases, we only search for a result and are accurate, then precision is 100%, but recall is very low, and if we return all the results, such as recall is 100%, but precision will be very low. Therefore, in different occasions need to judge their own hope precision higher or recall higher. If you are doing an experimental study, you can draw Precision-recall curves to help with the analysis.

sensitivity and false negative rate (missed diagnosis rate), specificity and false positive rate (misdiagnosis rate)

Sensitivity (also known as true positive rate, sensitivity) = true positive number/(true positive number + false negative number) *100%.

The percentage of the correct diagnosis of the patient's degree, which is the actual illness.

Specificity (also known as true negative rate, specificity) = true negative number/(true negative number + false positive number) *100%.

Refers to the correct judgment of the degree of non-patient, that is, the actual absence of disease and is correctly diagnosed as a disease-free percentage.

Roc and AUC

ROC and AUC are indicators of the evaluation classifier

Returning to Roc, the ROC's full name is called receiver Operating characteristic.
ROC Focus on two indicators

True Positive Rate (TPR)  = TP/(TP + FN)

TPR represents the probability that a positive case can be divided into pairs

False Positive Rate (FPR) = FP/(fp + TN)

FPR represents the probability of dividing a negative case into a positive case

Longitudinal axis: True rate (hit rate) true positive rates, TPR, called sensitivity . In all real cases, the correct recognition of the positive case proportions.

Horizontal axis: false positive rate (false rate) false positiverate, FPR, called (1-specificity). In all actual negative cases, the error is identified as the negative case scale of the positive example.

In Roc space, the horizontal axis of each point is FPR, and the ordinate is TPR, which also depicts the trade-off between the classifier in TP (the real positive example) and the FP (the wrong positive example).

With the recall rate as the y-axis and the specific x-axis, we get the ROC curve directly. From the definition of recall and specificity, the higher the recall rate and the smaller the specificity, the more efficient our models and algorithms are. That is, the closer the ROC curve is drawn, the better it is to the left. As shown in the diagram on the left. From a geometrical point of view, the larger the area below the ROC curve, the better the model. So sometimes we use the area under the ROC curve, that is, the AUC (areas under Curve) value as the standard for the algorithm and the model.

The ROC's main analytical tool is a curved--roc curve painted in the ROC space. We know that for a binary classification problem, the value of the instance is often a continuous value, and we classify the instances into positive or negative classes (for example, greater than the threshold value divided into positive classes) by setting a threshold value. So we can change the threshold, according to the different thresholds to classify, according to the classification results to calculate the corresponding points in Roc space, connect these points to form the ROC curve. ROC Curve (0,0), actually (0, 0) and (1, 1), the ROC curve actually represents a random classifier. In general, this curve should be placed above (0, 0) and (1, 1) lines, as shown in the figure.

The ROC curve is used to evaluate the performance of the classifier. A point pair of TPR and FPR can be calculated by testing the classification results. By adjusting the threshold (from 0.1 to 0.9) of the classifier classification, the threshold is set to classify the instances into positive or negative classes (for example, greater than threshold values are divided into positive classes). Therefore, according to the change threshold value will produce different effects of the classification, to obtain multiple classification results points, you can draw a curve, after (0, 0), (1, 1).

The curve is at the top left of the diagonal, and the farther away it is, the better classification effect. If it appears in the lower right corner of the diagonal, the intuitive remedy is to reverse all the predictions, namely: the classifier is intended to identify the positive example, but the effect is poor, so the output of the classifier is positive and negative, the output of the positive example as a negative example, the negative example as a positive example. To get a good classifier. At the source, the worse the classifier, the better.

Using ROC curve to represent the performance of the classifier is very intuitive and useful. However, people always want to have a value to mark the quality of the classifier. So the area under Roc Curve (AUC) appeared. As the name implies, the AUC value is the size of the portion of the area below the ROC curve. In general, the value of AUC is between 0.5 and 1.0, and the larger AUC represents a better performance.

P/R and ROC are two different evaluation indexes and calculation methods, in general, the former, classification, identification and other use of the latter.

With the accuracy of the y-axis and the recall rate as the x-axis, we get the PR curve. It is still understood from the definition of accuracy and recall that the higher the accuracy rate, the higher the recall rate, and the more efficient our models and algorithms will be. That is, the PR curve is drawn more closely to the right of the better.

Accuracy rate (Precision) = tp/(TP+FP)

Reflects the specific gravity of a true positive sample in a positive case determined by the classifier

False alarm probability (Alarm) FA = FP/(TP + fp) = 1–precision

reflect the number of cases that were found to be in a positive sample.

Recall rate (Recall) =tp/(TP+FN) = 1-fn/p

Also known as True Positive rate, which reflects the proportion of positive cases that are correctly determined to account for total positive cases

Leak alarm probability (Missing Alarm) MA = fn/(TP + FN) = 1–tp/p = 1-recall

Reflects how many positive cases have been false negative.

f-measure or balanced F-score

F = 2 *  Recall rate *  accuracy rate/(recall rate + accuracy rate)

That's what's traditionally called F1 measure.

Sensitivity (also known as true positive rate) sensitivity = tp/(TP + FN);

True Positive Rate (TPR)  = TP/(TP + FN)

The percentage of the correct diagnosis of the patient's degree, which is the actual illness.
That is, in fact, the percentage of the disease that is correctly judged to be diseased by the screening criteria.

(False negative rate) False negative rate (FNR) = fn/(TP + FN) 
                         = 1-sensitivity
                         = 1-TPR

false negative rate is also called missed diagnosis rate , that is, the actual illness, but according to the screening criteria are determined as the percentage of non-patients.

Specificity (true negative rates) specificity = TN/(FP + tn)
True Negative rate (TNR) = tn/(TP + TN)

That is, the actual absence of disease in accordance with the diagnostic criteria is correctly sentenced to a percentage of disease-free.

False positive rate false Positive rates (FPR) = fp/(FP + TN)
                                  = 1-specificity
                                  =1-tnr

false positives are also known as misdiagnosis rates , which are actually disease-free, but are determined to be the percentage of sickness according to diagnostic criteria.

to sum up, Recall (recall rate) and sensitivity (sensitivity) are the same concept, others are not the same point.

a reference to the ROC and PR indicators in the classification algorithm
http://blog.csdn.net/jiandanjinxin/article/details/51841726

MATLAB Implementation

AUC Calculation

ROC Calculation

Precision-recall and ROC Curves

Matlab code for Precision/recall, ROC, accuracy, f-measure

This article refer to the homepage
Accuracy and recall ratio, ROC curve and PR curve
Http://bookshadow.com/weblog/2014/06/10/precision-recall-f-measure/
http://blog.csdn.net/marising/article/details/6543943
Http://blog.sina.com.cn/s/blog_4dff58fc010176ax.html
Http://blog.sina.com.cn/s/blog_49ea41a20102w4kd.html
http://blog.csdn.net/wangran51/article/details/7579100

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Recall (recall) Precision (accuracy) f-measure e value sensitivity (sensitivity) specificity (specificity) misdiagnosis rate of missed diagnosis ROC AUC

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support