A very common category in ROC curve machine learning is the two-meta classifier. Many two-dollar classifiers produce a probabilistic predictive value, rather than just a 0-1 predictive value. We can use a certain critical point (for example, 0.5) to classify which predictions are 1 and which predictions are 0. After the two-yuan predictive value is obtained, a confusing matrix can be constructed to evaluate the predictive effect of the two-yuan classifier. All the training data will fall into this matrix, and the number on the diagonal represents the correct number of predictions, that is, True positive+true nagetive. The TPR (true rate or sensitivity) and TNR (true negative rate or specificity) can be calculated accordingly. We subjectively hope that these two indicators the bigger the better, but unfortunately the two are a relationship between the rise. In addition to the classifier's training parameters, the critical point of choice, will also greatly affect TPR and TNR. Sometimes it is possible to choose specific critical points based on specific problems and needs.
If we choose a series of critical points, we will get a series of TPR and TNR, the corresponding point of these values to link up, forming the ROC curve. The ROC curve can help us understand the performance of this classifier, and can easily compare the performance of different classifiers. When the ROC curve is drawn, it is customary to use 1-TNR as the horizontal axis and TPR as the ordinate. Let's look at how to draw the ROC curve in the R language.
# do a logistic regression, generate a probabilistic predictive value
model1 <-glm (y~., Data=newdata, family= ' binomial ')
pre <-predict (model1,type= ' Response ')
# puts the predictive probability prob and the actual result y in a data box. The number
<-data.frame (prob=pre,obs=newdata$y)
# sorted
from lowest to highest forecast probability Data <-Data[order (Data$prob),]
n <-nrow (data)
TPR <-FPR <-Rep (0,n)
# Calculate TPR and FPR according to different critical value threshold, then plot for
(i-1:n) {
threshold <-data$prob[i]
tp <-sum (Data$prob & Gt Threshold & data$obs = = 1
fp <-sum (Data$prob > Threshold & data$obs = 0)
tn <-sum (data$prob < threshold & data$obs = 0
fn <-sum (Data$prob < threshold & data$obs = 1)
tpr[i] <-tp/ (TP+FN) # True rate
Fpr[i] <-fp/(TN+FP) # false positive rate
}
Plot (fpr,tpr,type= ' l ')
Abline (a=0,b=1)
R also has a package specially designed to draw ROC curves, such as the common ROCR package, which can be used not only for drawing, but also for calculating the area under the ROC curve AUC to evaluate the comprehensive performance of the classifier, which takes between 0-1, the larger the better.
Library (ROCR)
pred <-Prediction (pre,newdata$y)
performance (pred, ' AUC ') @y.values #AUC值
Perf <-Performance (pred, ' TPR ', ' FPR ')
plot (perf)
ROCR Package Drawing function is relatively simple, the author prefers to use more powerful proc package.
It can be easily compared to two classifier, but also automatically labeled the optimal critical point, the figure looks more beautiful. Library (PROC) Modelroc <-Roc (newdata$y,pre) Plot (Modelroc, Print.auc=true, Auc.polygon=true, Grid=c (0.1, 0.2), GR Id.col=c ("Green", "Red"), Max.auc.polygon=true, auc.polygon.col= "Skyblue", Print.thres=true)