Naive Bayesian algorithm and its implementation

Source: Internet
Author: User

1. Introduction to naive Bayesian algorithm

One to classify x= (A,b,c ... ), judging x belongs to Y1,y2,y3 ... Which class of the category.

Bayesian formula:

The algorithm is defined as follows:

(1), set X={A1, A2, A3, ...} For one to classify, while A1, A2, A3 ... Characteristics of X, respectively

(2), there are categories set c={y1, y2, Y3, ...}

(3), calculated P (y1|x), P (y2|x), P (y3|x), ....

(4), if P (Y (k) |x) =max{p (y1|x), P (y2|x), P (y3|x), ...}, then x belongs to P (Y (k) |x)

Calculation:

(1), find a known classification of the set of items to be categorized, that is, training set.

(2), the statistic obtains the conditional probability estimate of each characteristic attribute under each category. That

P (a1|y1) .... P (am|y1) (in the) of .....

.

.

.

P (A1|yn) .... P (An|yn) (in the) of .....

(3), if each characteristic attribute is conditionally independent, then according to the Bayesian formula:

2. Examples of patient classification

Let me start with an example, and you'll see that the Bayesian classifier is understood and not difficult at all.

A hospital received six outpatient patients in the morning, such as the following table.

Symptoms of occupational diseases

Sneezing nurse catching a cold
Sneezing farmer Allergy
Headache for construction workers
Headache Construction workers cold
Sneezing teacher catching a cold
Headache Teacher Concussion

Now there's a seventh patient, a sneezing construction worker. What is the probability of his catching a cold?

According to Bayes theorem:

P (a| B) = P (b| A) P (a)/P (B)

Can get

P (Cold | Sneezing x construction workers)
= p (sneezing x construction worker | cold) x P (Cold)
/P (sneezing x construction workers)

It is assumed that the two characteristics of "sneezing" and "construction worker" are independent, so the above equation becomes

P (Cold | Sneezing x construction workers)
= P (Sneezing | cold) x P (construction worker | cold) x P (Cold)
/p (sneezing) x p (construction workers)

This can be calculated.

P (Cold | Sneezing x construction workers)
= 0.66 x 0.33 x 0.5/0.5 x 0.33
= 0.66

As a result, the sneezing construction worker has a 66% chance of catching a cold. In the same vein, you can calculate the likelihood of a patient suffering from allergies or concussions. By comparing these probabilities, you can know what disease he is most likely to have.

This is the basic method of Bayesian classifier: on the basis of statistical data, according to some characteristics, the probability of each category is calculated and the classification is realized.

3. Python implementation

 fromNumPyImport*defLoaddataset (): Postinglist=[['my','Dog',' has','Flea','problems',' Help',' please'],                 ['maybe',' not',' Take','him',' to','Dog','Park','Stupid'],                 ['my','dalmation',' is',' So','Cute','I',' Love','him'],                 ['Stop','Posting','Stupid','Worthless','Garbage'],                 ['Mr','Licks','ate','my','Steak',' How',' to','Stop','him'],                 ['quit','Buying','Worthless','Dog',' Food','Stupid']] Classvec= [0,1,0,1,0,1]#1 is abusive, 0 not    returnPostinglist,classvecdefcreatevocablist (dataSet): Vocabset= Set ([])#Create empty Set     forDocumentinchDataset:vocabset= Vocabset | Set (document)#Union of the sets    returnlist (Vocabset)defSetofwords2vec (Vocablist, Inputset): Returnvec= [0]*Len (vocablist) forWordinchInputset:ifWordinchVocablist:returnvec[vocablist.index (word)]= 1Else:Print "The Word:%s is isn't in my vocabulary!"%WordreturnReturnvecdeftrainNB0 (trainmatrix,traincategory): Numtraindocs=Len (trainmatrix) numwords=Len (trainmatrix[0]) pabusive= SUM (traincategory)/float (numtraindocs) p0num= Ones (numwords); P1num = Ones (numwords)#Change to ones ()P0denom = 2.0; P1denom = 2.0#Change to 2.0     forIinchRange (Numtraindocs):ifTraincategory[i] = = 1: P1num+=Trainmatrix[i] P1denom+=sum (trainmatrix[i])Else: P0num+=Trainmatrix[i] P0denom+=sum (trainmatrix[i]) P1vect= Log (P1num/p1denom)#Change to log ()P0vect = log (p0num/p0denom)#Change to log ()    returnp0vect,p1vect,pabusivedefclassifynb (vec2classify, P0vec, P1vec, pClass1): P1= SUM (vec2classify * P1vec) + log (PCLASS1)#element-wise multP0 = SUM (vec2classify * P0vec) + log (1.0-PClass1)ifP1 >P0:return1Else:         return0defbagofwords2vecmn (Vocablist, Inputset): Returnvec= [0]*Len (vocablist) forWordinchInputset:ifWordinchVocablist:returnvec[vocablist.index (word)]+ = 1returnReturnvecdefTESTINGNB (): Listoposts,listclasses=loaddataset () myvocablist=createvocablist (listoposts) Trainmat=[]     forPostindocinchlistOPosts:trainMat.append (Setofwords2vec (Myvocablist, Postindoc)) P0v,p1v,pab=trainNB0 (Array (trainmat), Array (listclasses)) Testentry= [' Love','my','dalmation'] Thisdoc=Array (Setofwords2vec (Myvocablist, testentry))PrintTestentry,'classified as:', CLASSIFYNB (thisdoc,p0v,p1v,pab) testentry= ['Stupid','Garbage'] Thisdoc=Array (Setofwords2vec (Myvocablist, testentry))PrintTestentry,'classified as:', CLASSIFYNB (THISDOC,P0V,P1V,PAB)

Naive Bayesian algorithm and its implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.