Generally, the error rate of classification results can be used as the criterion for determining the classifier. However, when the number of positive examples and the number of inverse examples are not equal during Classifier Training, this kind of evaluation criteria will cause problems. This phenomenon is also known as unbalanced classification. The following measures are available.
(1) Accuracy <precise> and recall rate <Recall>
As shown in: accuracy refers to the proportion of the predicted real positive examples to all real positive examples, which is equal to TP/(TP + FP ), the recall rate refers to the percentage of predicted real positive examples to all real positive examples, which is equal to TP/(TP + FN ). Generally, we can easily construct a classifier with a high accuracy rate or a high recall rate, but it is difficult to guarantee both. If any sample is judged as a positive sample, the recall rate reaches, and the accuracy is very low. Constructing a classifier that maximizes both the accuracy and recall rate is challenging. In this case, we can use the F-score = precise * recall/(precise + recall) quantity to measure. The larger the value, the better.
(2) ROC curve
Def plotroc (predstrengths, classlabels): Import matplotlib. pyplot as PLT cur = (1.0, 1.0) # cursor ysum = 0.0 # variable to calculate AUC numposclas = sum (Array (classlabels) = 1.0) ystep = 1/float (numposclas); xstep = 1/float (LEN (classlabels)-numposclas) sortedindicies = predstrengths. argsort () # get sorted index, it's reverse fig = PLT. figure () # these three lines of code are used to build the brush fig. CLF () AX = PLT. subplot (111) # loop through all the values, drawing a line segment at each point for index in sortedindicies. tolist () [0]: If classlabels [Index] = 1.0: delx = 0; dely = ystep; else: delx = xstep; dely = 0; ysum + = cur [1] # Draw line from cur to (cur [0]-delx, cur [1]-dely) ax. plot ([cur [0], cur [0]-delx], [cur [1], cur [1]-dely], c = 'B ') cur = (cur [0]-delx, cur [1]-dely) ax. plot ([0, 1], [0, 1], 'B --') PLT. xlabel ('false positive rate'); PLT. ylabel ('true positive rate') PLT. title ('roc curve for AdaBoost horse colic detection system') ax. axis ([0, 1, 1]) PLT. show () print "the area under the curve is:", ysum * xstep
Small village chief source: http://blog.csdn.net/lu597203933 welcome to reprint or share, but please be sure to declare the source of the article. (Sina Weibo: small village chief Zack. Thank you !)