Recommended system metrics-accuracy (Precision), Recall (Recall), F value (f-measure) _dm

Source: Internet
Author: User


The following is a brief list of common recommended system metrics:

1. Accuracy rate and Recall rate (Precision & Recall)

Accuracy and recall rates are two measures widely used in the field of information retrieval and statistical classification to evaluate the quality of the results. The accuracy is to retrieve the number of related documents and the total number of documents retrieved, to measure the precision of the retrieval system; Recall is the ratio of the number of documents retrieved and the number of documents in the document library, and the recall of the retrieval system is measured.

In general, precision is the number of retrieved items (such as documents, Web pages, etc.) is accurate, recall is the number of all accurate entries have been retrieved.

The correct rate, recall rate and F value are the important evaluation indexes to select the target in the mixed environment. Take a look at the definition of these indicators first:

1. The correct rate = The correct number of messages extracted/extracted information strips

2. Recall rate = Number of information in the correct message bar/sample extracted

The value is between 0 and 1, the closer the value is to 1, the higher the precision or recall.

3. F = Correct rate * Recall rate * 2/(correct rate + recall rate) (f value is the correct rate and recall rate of the harmonic mean)

Take this example: there are 1400 carp, 300 shrimp and 300 turtles in a pond. It is now for the purpose of catching carp. 700 Carp, 200 shrimp and 100 turtles were caught in a large net. Then, the indicators are as follows:

Correct rate = 700/(700 + 200 + 100) = 70%

Recall rate = 700/1400 = 50%

F value = 70% * 50% * 2/(70% + 50%) = 58.3%

Let's see if all the carp, shrimp and turtle in the pool are catch, what's the difference:

Correct rate = 1400/(1400 + 300 + 300) = 70%

Recall rate = 1400/1400 = 100%

F value = 70% * 100% * 2/(70% + 100%) = 82.35%

Thus, the correct rate is the proportion of target results in the results of the capture; recall, as the name suggests, is the proportion of the target category from the area of concern, and the F value is the evaluation index of the combination of the two indicators, which is used to comprehensively reflect the overall index.

Of course, we hope that the higher the precision the better, and the higher the recall the better, but in fact the two are contradictory in some cases. For example, in extreme cases, we only search for a result, and it is accurate, then precision is 100%, but recall is very low, and if we return all the results, such as recall is 100%, but precision will be very low. Therefore, in different occasions need to judge their own hope precision higher or higher recall. If you are doing experimental research, you can draw Precision-recall curves to help with the analysis.

2. Comprehensive Evaluation Index (F-MEASURE)

P and R indicators sometimes have contradictory situations, so that they need to be considered synthetically, the most common method is F-measure (also known as F-score).

F-measure is the weighted harmonic averaging of precision and recall:

When the parameter α=1, is the most common F1, also namely

It is F1 that the results of P and R are synthesized and the test method is more effective when the F1 is higher.

3, E value

The E value represents the weighted average of the precision P and recall R, when one of them is 0 o'clock and the e value is 1, and its formula is calculated:

The larger the B, the greater the weight of the precision.

4. Average correct rate (Average Precision, AP)

The average correct rate represents the average of the correct rate on points of different recall.



In the information retrieval, classification system, there are a series of indicators, make it clear that these indicators for the evaluation of retrieval and classification performance is very important, so recently based on the blog to do a summary of users.

Accuracy rate, recall rate, F1

Information retrieval, classification, identification, translation and other areas of the two most basic indicators are the recall rate (Recall Rate) and accuracy (Precision Rate), the recall rate is also called recall rate, accuracy is also called the precision ratio, concept formula:

Recall rate (Recall) = total number of related files/systems retrieved by the system

Accuracy rate (Precision) = total number of retrieved files/systems retrieved by the system

The diagram indicates the following:


Note: Accuracy and recall rates are mutually influential, ideally it is to do both high, but the general situation of high accuracy, recall rate is low, low recall rate, high accuracy, of course, if both are low, that is where the problem. In general, the accuracy and recall rates of a set of different thresholds are calculated with different thresholds, as shown below:

If the search is done, that is to ensure that the recall of the situation to improve the accuracy rate, if the disease monitoring, anti-spam, is guaranteed accuracy rate of the condition, the promotion of recall.

Therefore, in the case of both high demand, can be measured by F1. [Python] View plain copy F1 = 2 * p * r/(P + r)

The formula is basically this, but how do you calculate a, B, C, d in Figure 1. This requires manual labeling, manual tagging of data requires more time and boring, if only to do experiments can be used with ready-made corpus. Of course, there is a way to find a more mature algorithm as a benchmark, using the results of the algorithm as a sample to compare, this method is also a bit problematic, if there is a good algorithm, no longer need to study.

AP and map (mean Average Precision)

Map is to solve the p,r,f-measure of single point value limitations. In order to get an index that can reflect the global performance, we can look at the graph below, in which the distribution of two curves (square dots and dots) corresponds to the accuracy rate of two retrieval systems-the recall curve.

It can be seen that although the performance curves of the two systems overlap, the performance of the system with dots is much better than the system marked with squares.

From this we can find that if a system has a good performance, its curve should be as prominent as possible.

More specifically, the area between the curve and the axis should be larger.

Ideally, the system should contain an area of 1, and all systems should contain more than 0 of the area. This is the most common performance index used to evaluate the information retrieval system, and the average accuracy map has the following definitions: (of which p,r is the accuracy and recall rate respectively)

ROC and AUC

ROC and AUC are the indicators of the evaluation classifier, and the ABCD of the first graph above is still used, just a little change.



Back to Roc, Roc's full name is called receiver operating characteristic.

ROC concerns two indicators

True Positive Rate (TPR) = TP/[TP + FN], and TPR represents the probability of splitting a positive case

False Positive Rate (FPR) = FP/[fp + TN], FPR represents the probability of dividing a negative case into a positive case

In the ROC space, the horizontal axis of each point is FPR, and the ordinate is TPR, which also depicts the classifier's trade-off between TP (the real positive case) and the FP (wrong positive example). ROC's main analytical tool is a curved--roc curve painted in the ROC space. We know that for a binary classification problem, the value of an instance is often a continuous value, and we set a threshold to classify an instance into a positive class or a negative class (for example, a higher-than-threshold value is divided into a positive class). So we can change the threshold, according to different thresholds for classification, according to the classification results of the corresponding points in Roc space, connect these points to form ROC curve. ROC Curve (0,0) (1,1), in fact (0, 0) and (1, 1) connected to the ROC curve actually represents a random classifier. In general, this curve should be at the top of (0, 0) and (1, 1) lines. As shown in the figure.


ROC curve to express the classifier's performance is very intuitive and easy to use. However, people always want to have a value to mark the quality of the classifier.

So area Under Roc Curve (AUC) appeared. As the name suggests, the value of AUC is the size of the area below the ROC curve. Typically, the AUC value is between 0.5 and 1.0, and the larger AUC represents a better performance.

AUC Computing Tools:

http://mark.goadrich.com/programs/AUC/

P/R and ROC are two different evaluation indices and calculation methods, in general, the former, classification, identification and other use of the latter.

Reference Links:

http://www.vanjor.org/blog/2010/11/recall-precision/

http://bubblexc.com/y2011/148/

Http://wenku.baidu.com/view/ef91f011cc7931b765ce15ec.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.