Using Python to draw Roc curves and AUC values to compute _python

Source: Internet
Author: User
Tags svm

Objective

ROC (Receiver operating characteristic) curves and AUC are often used to evaluate the merits of a binary classifier (binary classifier). This article will first briefly introduce ROC and AUC, and then illustrate how Python makes the ROC graph and calculates AUC.

AUC Introduction

AUC (Area Under Curve) is a very common evaluation indicator in the machine learning two classification model, which is more tolerant than f1-score for project imbalance, and the current common machine learning Couchen (such as Scikit-learn) are generally integrated into the calculation of this index, But sometimes the model is separate or written by oneself, at this time want to evaluate the training model is good or bad you have to do a AUC calculation module, this article in the query data found Libsvm-tools has a very easy to understand AUC calculation, so dug out for future use.

AUC calculation

The AUC calculation is divided into the following three steps:

1, the calculation of data preparation, if the model training only the training set of the general use of cross-validation to calculate, if the evaluation set (evaluate) can generally be directly calculated, the format of the data is generally required to predict the score and its target category (attention is the target category, not the predicted category)

2, the transverse (x:false Positive Rate) and the longitudinal (y:true Positive Rate) points are divided according to the threshold value.

3, will sit punctuation after the curve to calculate the area under the curve, is the value of AUC

Directly on the Python code

 #!-*-coding=utf-8-*-import pylab as pl from math import log,exp,sqrt Evaluate_result = "You are file path" db = [] #[score,nonclk,clk] pos, neg = 0, 0 with open (Evaluate_result, ' R ') as Fs:for line in Fs:nonc Lk,clk,score = Line.strip (). Split (' \ t ') nonclk = Int (NONCLK) CLK = Int (CLK) score = float (score) db.append ([Score,nonc LK,CLK]) pos + CLK neg + = NONCLK db = sorted (db, Key=lambda x:x[0], reverse=true) #计算ROC坐标点 Xy_arr = [] tp, FP =  
0., 0.  
For I in range (LEN (db)): TP = db[i][2] fp + + db[i][1] Xy_arr.append ([Fp/neg,tp/pos]) #计算曲线下面积 AUC = 0. prev_x = 0 for x,y in xy_arr:if x!= Prev_x:auc + = (x-prev_x) * y prev_x = x print "The AUC is%s." %auc x = [_v[0] for _v in Xy_arr] y = [_v[1] for _v in Xy_arr] Pl.title ("ROC Curve of%s (AUC =%.4f)"% (' SVM ', AUC)) pl.  Xlabel ("False Positive Rate") Pl.ylabel ("True Positive Rate") pl.plot (x, y) # use Pylab to plot x and y Pl.show () # Show The Plot on the screen 

The input dataset can refer to the SVM predictive results

The format is:

NONCLK \ t CLK \ Score

which
1, Nonclick: The data is not clicked, can be regarded as the number of negative samples

2, CLK: The number of clicks, can be seen as the number of positive samples

3, Score: The predicted score, the score for the group of positive and negative samples of the pre-statistics can reduce the amount of AUC calculation

The results of the run are:

If this machine is not installed Pylab can directly annotate dependencies and paint parts

Attention

The code posted above:

1, can only calculate the results of two categories (as for the two classification of the label casually processing)

2, the above code in each score has done a threshold, in fact, this efficiency is quite low, you can sample samples or in the calculation of the axis coordinates when the division calculation

Summarize

The above is the entire content of this article, I hope the content of this article to everyone's study or work to bring certain help, if you have questions you can message exchange.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.