Calculation of AUC (area under Roc Curve) and its relationship with ROC

Source: Internet
Author: User

Let's start at the beginning. The AUC is a standard used to measure the quality of a classification model. There are a number of such criteria, such as the Eminence Standard in machine learning literature about 10 years ago: Classification accuracy, recall and precision commonly used in the field of information retrieval (IR), and so on. In fact, the measure reflects people's pursuit of "good" classification results, the different measures of the same period reflect people's different understanding of what is the most fundamental problem of "good", while the measure of popularity in different periods reflects the change of people's understanding of the depth of things. In recent years, with the related technology of machine learning from laboratory to practical application, some practical problems have put forward new demand to measure standard. In particular, the unbalanced distribution of samples in different categories (class distribution imbalance problem) in the real world. The traditional metrics such as accuracy do not respond properly to the performance of the classifier. For example, there are 90 samples of Class A In the test sample, and 10 samples in class B. Classifier C1 all the test samples are divided into a class, the classifier C2 the Class A 90 samples to 70, Class B 10 samples 5. Then the classification precision of C1 is 90%,C2, and the accuracy is 75%. But obviously C2 is more useful. In addition, the cost of making different mistakes in some classification problems is different (costing sensitive learning). In this way, the default of 0.5 for categorical thresholds also seems inappropriate.

In order to solve the above problems, a new classification model performance evaluation Method--roc analysis is introduced from the field of medical analysis. ROC Analytics is a rich content in itself, and interested readers can google themselves. Since my own understanding of ROC analysis is not profound, so here are just a few simple conceptual introduction.

The full name of the ROC is called receiver Operating characteristic, whose main analysis tool is a curved--roc curve drawn on a two-dimensional plane. The horizontal axis of the plane is false positive rate (FPR), and the ordinate is true positive rate (TPR). For a classifier, we can get a TPR and FPR point pair based on its performance on the test sample. This allows the classifier to be mapped to a point on the ROC plane. Adjusting the threshold used when classifying this classifier, we can get a curve that passes through (0, 0), (1, 1), which is the ROC curve of this classifier. In general, this curve should be at the top of (0, 0) and (1, 1) lines. Because the ROC curve (0, 0) and (1, 1) is actually represented by a random classifier. If, unfortunately, you get a classifier at the bottom of this line, an intuitive remedy is to reverse all predictions-that is, the classifier output is a positive class, then the result of the final classification is a negative class, and conversely, a positive class. Although, using ROC curve to represent the performance of the classifier is very intuitive and useful. However, people always want to have a value to mark the quality of the classifier. So the area under Roc Curve (AUC) appeared. As the name implies, the AUC value is the size of the portion of the area below the ROC curve. In general, the value of AUC is between 0.5 and 1.0, and the larger AUC represents a better performance. Well, so far, all of the previous introductory sections are over, and the following is the topic of this post: a summary of the calculation methods of the AUC.

Most intuitively, according to the AUC name, we know that the area below the ROC curve is the value of the AUC. In fact, this is also a common method of AUC calculation in early machine learning literature. Because our test samples are limited. The AUC curve we get is bound to be a ladder-shaped one. Therefore, the calculated AUC is the sum of the areas below these steps. In this way, we first sort the score (assuming that the larger the score, the greater the probability that the sample belongs to the positive class), and then scan it to get the AUC we want. However, there is a drawback, that is, when the score of multiple test samples is equal, we adjust the threshold value, not the curve of a ladder upward or to the right of the extension, but inclined upwards to form a trapezoid. At this point, we need to calculate the area of this trapezoid. From this, we can see that it is actually troublesome to calculate the AUC in this way.

A very interesting property about the AUC is that it is equivalent to Wilcoxon-mann-witney test. The proof of this equivalence relationship is left in the next post. And the Wilcoxon-mann-witney test is testing arbitrary to a positive class sample and a negative class sample, the score of the positive class sample is larger than the score of the negative class sample. With this definition, we get another way to calculate the AUC: to get this probability. We know that the method of getting the probabilities we commonly use in finite samples is to estimate them by frequency. This estimate gradually approximates the real value as the size of the sample expands. This and the above method, the more the number of samples, the more accurate calculation of the AUC similar, and the calculation of integrals, the finer the division between the plot, the more accurate calculation is the same truth. Specifically, all of the MXN (M is the number of positive class samples, N is the number of negative samples) in the positive and negative sample pairs, how many groups of positive samples of score than negative samples of score. When the score of the positive and negative samples in the two-tuple are equal, it is calculated according to 0.5. Then divide by MN. The complexity of implementing this method is O (n^2). N is the number of samples (i.e. n=m+n)

The third method is actually the same as the second method described above, but the complexity is reduced. It is also first to score from the big to the small sort, then the maximum score corresponding to the sample rank is N, the second largest score corresponds to the rank of the sample n-1, and so on. The rank of all positive class samples is then added, minus the score of the positive class sample to the minimum m value. What is obtained is how many of the samples in the sample are score larger than the negative-class sample score. And then divided by MXN. That

Auc= (add all the positive examples)-m* (m+1))/(M*n)

In addition, it is important to note that, in the case of score equality, a sample of equal score needs to be assigned the same rank (whether the equivalent score is in the same sample or between samples of different classes). This is done by averaging the rank of all these score equal samples. Then use the formula above.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.