Unbalanced classification of notes in Machine Learning Practice

Source: Internet
Author: User

Generally, the error rate of classification results can be used as the criterion for determining the classifier. However, when the number of positive examples and the number of inverse examples are not equal during Classifier Training, this kind of evaluation criteria will cause problems. This phenomenon is also known as unbalanced classification. The following measures are available.

(1) Accuracy <precise> and recall rate <Recall>

As shown in: accuracy refers to the proportion of the predicted real positive examples to all real positive examples, which is equal to TP/(TP + FP ), the recall rate refers to the percentage of predicted real positive examples to all real positive examples, which is equal to TP/(TP + FN ). Generally, we can easily construct a classifier with a high accuracy rate or a high recall rate, but it is difficult to guarantee both. If any sample is judged as a positive sample, the recall rate reaches, and the accuracy is very low. Constructing a classifier that maximizes both the accuracy and recall rate is challenging. In this case, we can use the F-score = precise * recall/(precise + recall) quantity to measure. The larger the value, the better.

(2) ROC curve



Def plotroc (predstrengths, classlabels): Import matplotlib. pyplot as PLT cur = (1.0, 1.0) # cursor ysum = 0.0 # variable to calculate AUC numposclas = sum (Array (classlabels) = 1.0) ystep = 1/float (numposclas); xstep = 1/float (LEN (classlabels)-numposclas) sortedindicies = predstrengths. argsort () # get sorted index, it's reverse fig = PLT. figure () # these three lines of code are used to build the brush fig. CLF () AX = PLT. subplot (111) # loop through all the values, drawing a line segment at each point for index in sortedindicies. tolist () [0]: If classlabels [Index] = 1.0: delx = 0; dely = ystep; else: delx = xstep; dely = 0; ysum + = cur [1] # Draw line from cur to (cur [0]-delx, cur [1]-dely) ax. plot ([cur [0], cur [0]-delx], [cur [1], cur [1]-dely], c = 'B ') cur = (cur [0]-delx, cur [1]-dely) ax. plot ([0, 1], [0, 1], 'B --') PLT. xlabel ('false positive rate'); PLT. ylabel ('true positive rate') PLT. title ('roc curve for AdaBoost horse colic detection system') ax. axis ([0, 1, 1]) PLT. show () print "the area under the curve is:", ysum * xstep


Small village chief source: http://blog.csdn.net/lu597203933 welcome to reprint or share, but please be sure to declare the source of the article. (Sina Weibo: small village chief Zack. Thank you !)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.