Using Python to draw ROC curve and AUC value calculation, rocauc

Source: Internet
Author: User
Tags svm

Using Python to draw ROC curve and AUC value calculation, rocauc

Preface

The ROC curve and AUC are often used to evaluate the merits of a binary classifier. This article will first briefly introduce ROC and AUC, and then use an example to demonstrate how to create a ROC curve and calculate AUC in python.

AUC Introduction

AUC (Area Under Curve) is a very common evaluation indicator in the Machine Learning binary classification model, compared with the F1-Score to the project imbalance has a greater degree of attention, currently, common machine learning libraries (such as scikit-learn) generally integrate the calculation of this indicator. However, sometimes models are independently or independently written, to evaluate the quality of the training model, you have to build an AUC computing module. In this article, we found that libsvm-tools has a very easy-to-understand auc calculation, therefore, it is used for future use.

AUC Calculation

AUC calculation is divided into the following three steps:

1. Prepare the computing data. If only the training set is used during model training, cross-validation is generally used for calculation. If an evaluation set (evaluate) is available, it can be directly calculated, the data format is generally the prediction score and its target category (note that the target category is not the predicted category)

2. Obtain the horizontal (X: False Positive Rate) and vertical (Y: True Positive Rate) points based on the threshold value.

3. Connect coordinate points into a curve and calculate the area under the curve, that is, the AUC value.

Directly use python code

#! -*-Coding = UTF-8-*-import pylab as plfrom math import log, exp, sqrtevaluate_result = "you file path" db = [] # [score, nonclk, clk] pos, neg = 0, 0 with open (evaluate_result, 'R') as fs: for line in fs: nonclk, clk, score = line. strip (). split ('\ t') nonclk = int (nonclk) clk = int (clk) score = float (score) db. append ([score, nonclk, clk]) pos + = clk neg + = nonclk db = sorted (db, key = lambda x: x [0], reverse = True )# Calculate ROC coordinate point xy_arr = [] tp, fp = 0 ., 0. for I in range (len (db): tp + = db [I] [2] fp + = db [I] [1] xy_arr.append ([fp/neg, tp/pos]) # Calculated area auc under the curve = 0. prev_x = 0for x, y in xy_arr: if x! = Prev_x: auc + = (x-prev_x) * y prev_x = xprint "the auc is % s. "% aucx = [_ v [0] for _ v in xy_arr] y = [_ v [1] for _ v in xy_arr] pl. title ("ROC curve of % s (AUC = %. 4f) "% ('svm ', auc) pl. xlabel ("False Positive Rate") pl. ylabel ("True Positive Rate") pl. plot (x, y) # use pylab to plot x and ypl. show () # show the plot on the screen

For the input dataset, see svm prediction results.

The format is:

nonclk \t clk \t score

Where:
1. nonclick: indicates the number of negative samples for unclicked data.

2. clk: Number of clicks, which can be viewed as the number of positive samples

3. score: predicted score. Using this score as the group for pre-Statistics of positive and negative samples can reduce the calculation workload of AUC.

The running result is:

If pylab is not installed on the local machine, you can directly comment out the dependency and drawing part.

Note:

The above code:

1. Only binary classification results can be calculated (as for binary classification tags, they can be processed as needed)

2. In the above Code, each score has a threshold value. In fact, the efficiency is quite low. You can sample the sample or perform an equal score calculation when calculating the horizontal axis coordinates.

Summary

The above is all the content of this article. I hope the content of this article will help you in your study or work. If you have any questions, please leave a message.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.