1. Introduction to naive Bayesian algorithm
One to classify x= (A,b,c ... ), judging x belongs to Y1,y2,y3 ... Which class of the category.
Bayesian formula:
The algorithm is defined as follows:
(1), set X={A1, A2, A3, ...} For one to classify, while A1, A2, A3 ... Characteristics of X, respectively
(2), there are categories set c={y1, y2, Y3, ...}
(3), calculated P (y1|x), P (y2|x), P (y3|x), ....
(4), if P (Y (k) |x) =max{p (y1|x), P (y2|x), P (y3|x), ...}, then x belongs to P (Y (k) |x)
Calculation:
(1), find a known classification of the set of items to be categorized, that is, training set.
(2), the statistic obtains the conditional probability estimate of each characteristic attribute under each category. That
P (a1|y1) .... P (am|y1) (in the) of .....
.
.
.
P (A1|yn) .... P (An|yn) (in the) of .....
(3), if each characteristic attribute is conditionally independent, then according to the Bayesian formula:
2. Examples of patient classification
Let me start with an example, and you'll see that the Bayesian classifier is understood and not difficult at all.
A hospital received six outpatient patients in the morning, such as the following table.
Symptoms of occupational diseases
Sneezing nurse catching a cold
Sneezing farmer Allergy
Headache for construction workers
Headache Construction workers cold
Sneezing teacher catching a cold
Headache Teacher Concussion
Now there's a seventh patient, a sneezing construction worker. What is the probability of his catching a cold?
According to Bayes theorem:
P (a| B) = P (b| A) P (a)/P (B)
Can get
P (Cold | Sneezing x construction workers)
= p (sneezing x construction worker | cold) x P (Cold)
/P (sneezing x construction workers)
It is assumed that the two characteristics of "sneezing" and "construction worker" are independent, so the above equation becomes
P (Cold | Sneezing x construction workers)
= P (Sneezing | cold) x P (construction worker | cold) x P (Cold)
/p (sneezing) x p (construction workers)
This can be calculated.
P (Cold | Sneezing x construction workers)
= 0.66 x 0.33 x 0.5/0.5 x 0.33
= 0.66
As a result, the sneezing construction worker has a 66% chance of catching a cold. In the same vein, you can calculate the likelihood of a patient suffering from allergies or concussions. By comparing these probabilities, you can know what disease he is most likely to have.
This is the basic method of Bayesian classifier: on the basis of statistical data, according to some characteristics, the probability of each category is calculated and the classification is realized.
3. Python implementation
fromNumPyImport*defLoaddataset (): Postinglist=[['my','Dog',' has','Flea','problems',' Help',' please'], ['maybe',' not',' Take','him',' to','Dog','Park','Stupid'], ['my','dalmation',' is',' So','Cute','I',' Love','him'], ['Stop','Posting','Stupid','Worthless','Garbage'], ['Mr','Licks','ate','my','Steak',' How',' to','Stop','him'], ['quit','Buying','Worthless','Dog',' Food','Stupid']] Classvec= [0,1,0,1,0,1]#1 is abusive, 0 not returnPostinglist,classvecdefcreatevocablist (dataSet): Vocabset= Set ([])#Create empty Set forDocumentinchDataset:vocabset= Vocabset | Set (document)#Union of the sets returnlist (Vocabset)defSetofwords2vec (Vocablist, Inputset): Returnvec= [0]*Len (vocablist) forWordinchInputset:ifWordinchVocablist:returnvec[vocablist.index (word)]= 1Else:Print "The Word:%s is isn't in my vocabulary!"%WordreturnReturnvecdeftrainNB0 (trainmatrix,traincategory): Numtraindocs=Len (trainmatrix) numwords=Len (trainmatrix[0]) pabusive= SUM (traincategory)/float (numtraindocs) p0num= Ones (numwords); P1num = Ones (numwords)#Change to ones ()P0denom = 2.0; P1denom = 2.0#Change to 2.0 forIinchRange (Numtraindocs):ifTraincategory[i] = = 1: P1num+=Trainmatrix[i] P1denom+=sum (trainmatrix[i])Else: P0num+=Trainmatrix[i] P0denom+=sum (trainmatrix[i]) P1vect= Log (P1num/p1denom)#Change to log ()P0vect = log (p0num/p0denom)#Change to log () returnp0vect,p1vect,pabusivedefclassifynb (vec2classify, P0vec, P1vec, pClass1): P1= SUM (vec2classify * P1vec) + log (PCLASS1)#element-wise multP0 = SUM (vec2classify * P0vec) + log (1.0-PClass1)ifP1 >P0:return1Else: return0defbagofwords2vecmn (Vocablist, Inputset): Returnvec= [0]*Len (vocablist) forWordinchInputset:ifWordinchVocablist:returnvec[vocablist.index (word)]+ = 1returnReturnvecdefTESTINGNB (): Listoposts,listclasses=loaddataset () myvocablist=createvocablist (listoposts) Trainmat=[] forPostindocinchlistOPosts:trainMat.append (Setofwords2vec (Myvocablist, Postindoc)) P0v,p1v,pab=trainNB0 (Array (trainmat), Array (listclasses)) Testentry= [' Love','my','dalmation'] Thisdoc=Array (Setofwords2vec (Myvocablist, testentry))PrintTestentry,'classified as:', CLASSIFYNB (thisdoc,p0v,p1v,pab) testentry= ['Stupid','Garbage'] Thisdoc=Array (Setofwords2vec (Myvocablist, testentry))PrintTestentry,'classified as:', CLASSIFYNB (THISDOC,P0V,P1V,PAB)
Naive Bayesian algorithm and its implementation