Python Draw Roc curve and AUC value calculation

Source: Internet
Author: User
Tags svm
Preface

The ROC (Receiver Operating characteristic) curve and AUC are often used to evaluate the merits and demerits of a binary classifier (binary classifier). This article will start with a brief introduction of ROC and AUC, and then use an example to demonstrate how Python makes the ROC curve and calculates the AUC.

AUC Introduction

The AUC (area under Curve) is a very common evaluation indicator in the machine learning two classification model, which is more tolerant of project imbalances than F1-score, and the current Common machine learning library (such as Scikit-learn) are generally integrated into the calculation of this indicator, But sometimes the model is alone or written by oneself, at this time want to evaluate the training model of good or bad you have to do a AUC calculation module, this article in the query data found that Libsvm-tools has a very easy to understand the AUC calculation, so keyed out for future use.

AUC calculation

The calculation of the AUC is divided into the following three steps:

1, the preparation of the calculation data, if the model training only training set the general use of cross-validation method to calculate, if there is an evaluation set (evaluate) can be directly calculated, the format of the data is generally required to predict the score and its target category (attention is the target category, not the predicted category)

2. The horizontal (x:false Positive rate) and the longitudinal (y:true Positive rate) points are divided according to the threshold value.

3. Calculate the area below the curve after connecting the punctuation to the curve, which is the value of AUC

Directly on Python code

#! -*-coding=utf-8-*-import Pylab as plfrom math import log,exp,sqrtevaluate_result= "You file path" db = [] #[score,nonclk,c Lk]pos, neg = 0, 0 with open (Evaluate_result, ' R ') as Fs:for line in Fs:nonclk,clk,score = Line.strip (). Split (' \ t ') noncl k = Int (NONCLK) CLK = Int (CLK) score = float (score) db.append ([SCORE,NONCLK,CLK]) pos = CLK Neg + = nonclk  db = sorted (DB, Key=lambda x:x[0], reverse=true) #计算ROC坐标点xy_arr = []TP, fp = 0., 0.  For I in range (LEN (db)): TP + = db[i][2] fp + db[i][1] Xy_arr.append ([fp/neg,tp/pos]) #计算曲线下面积auc = 0.  prev_x = 0for x, y in xy_arr:if x! = Prev_x:auc + = (x-prev_x) * y prev_x = xprint "The AUC is%s." %AUCX = [_v[0] for _v in xy_arr]y = [_v[1] for _v in Xy_arr]pl.title ("ROC Curve of%s (AUC =%.4f)"% (' SVM ', AUC)) Pl.xlabe  L ("False Positive rate") Pl.ylabel ("True Positive rate") pl.plot (x, y) # use Pylab to plot x and Ypl.show () # show the plot on The screen

Input datasets can refer to the SVM prediction results

The format is:

NONCLK \ t CLK \ Score

which
1, Nonclick: The data is not clicked, can be regarded as the number of negative samples

2, CLK: The number of clicks, can be considered as the number of positive samples

3, Score: The predicted score, the number of positive and negative samples to the group for the pre-statistics can reduce the calculation of the AUC

The result of the operation is:

If the machine is not installed Pylab can directly comment on the dependency and the drawing part

Attention

The code posted above:

1, can only calculate the results of two classification (as for the two classification of the label casually handled)

2, each score in the above code has done a threshold, in fact, this efficiency is quite low, you can sample samples or in the calculation of horizontal coordinate calculation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.