"Data mining" naive Bayesian algorithm for calculating the area of ROC curves

Source: Internet
Author: User


Recently on the data mining learning process, learn to naive Bayesian operation Roc Curve. It is also the experimental subject of this section, the calculation principle of ROC curve and if statistic TP, FP, TN, FN, TPR, FPR, ROC area and so on. The ROC area is often used to assess the accuracy of the model, generally think the closer to 0.5, the lower the accuracy of the model, the best state is close to 1, the correct model area is 1. The following is an introduction:

ROC The calculation principle of the area of curve

The working process frame diagram of naive Bayesian method

second, using Weka tools, to find the training of pre-processing data

1, using the naïve Bayesian algorithm to process the Weather.nominal.arff file, and then select Temperature Open, select Edit to find preprocessing data as shown in 1-1:

Figure 1-1 Full weather Data infographic

2, according to the above training tuples to calculate the prior probability of each class, the formula is P (C)

2.1. Calculate the prior probability

P (Play=yes) =9/14=0.643

P (Play=no) =5/14=0.357

2.2, calculate the conditional probability, according to the formula P (x| C

3, then according to the formula (showing one of the tuples for probability classification x= (outlook=sunny,temperature=mid,humidity=yes,windy=sunny)) substituting the above data:

3.1. P (x|paly=yes)=p (outlook=sunny|play=yes) * p (temperature=mid|play=yes) * p (humidity=yes|play=yes) * p ( Outlook=sunny|play=yes))

The same calculation:P (X|paly=no)

3.2, through the comparison of results, the meta-group play

3.3, then the calculation of the probability

4, and then cite the data mining concepts and technology in the P244 page method, 1-2 shows:

Figure 1-2 Returning the Data sample

For example non-real data: Because the probability of each tuple can be computed based on 3.3, the class is sorted by probability size. The real data of TP, FP, TN and FN are then based on the prior probability, and the data of TPR and FPR are not difficult to be calculated.

5, again quoting the "Data Mining concepts and technology" in the P245 page knowledge, with FPR as the x-axis, TPR as the y-axis, plotting the ROC curve of the data, the data in 4 respectively into into, get 1-3 shows:

Figure 1-3 returning the data graph

According to the shape, using the mathematical method to obtain the ROC curve area of 0.9222. Then use Weka to view the tool data, 1-4 shows:

Figure 1-4 Weka Return Data


[1] Data mining using Weka (http://www.cnblogs.com/bluewelkin/p/3538599.html)

[2] Weka use (basic configuration + spam filtering + Cluster Analysis + association Mining) (http://www.cnblogs.com/bitpeach/p/3770606.html)

[3] The calculation method of the area under the ROC curve (Http://wenku.baidu.com/view/3d2ac9202f60ddccda38a07a.html?re=view)

[4] Jiawei han, data mining concepts and techniques, p243-p245.

[5] Classification (data mining) (Http://wenku.baidu.com/link?url=EdT7Xxs-a_ 423om-48ih-kxtteprxeejci0-xsm1yk9xbkzgtvwqyiznpzwua8a-dlf-krehls63u9pxxxudjfcsdmbpz2kex5bhwtyswhe&qq-pf-to =PCQQ.C2C)

"Data mining" naive Bayesian algorithm for calculating the area of ROC curves

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.