Bayesian classification is an algorithm using probability and statistic knowledge to classify, and its classification principle is Bayesian theorem. The Bayesian theorem has the following formula:
650) this.width=650; "Src=" Http://s2.51cto.com/wyfs02/M02/8D/50/wKiom1iXH7qzQ3X2AAAI9To-mac657.png-wh_500x0-wm_3 -wmp_4-s_3022789441.png "Title=" Bayes theorem. png "alt=" wkiom1ixh7qzq3x2aaai9to-mac657.png-wh_50 "/>
The Bayesian formula shows that we can get a priori probability P (A), the conditional probability P (b| A) and Evidence P (B) to calculate the posteriori probability.
Naive Bayesian classifier is based on the hypothesis that the various conditions are independent of each other, and the categories of the most posterior probabilities are chosen according to the computed posterior probabilities as the categories of target evidence.
The steps to build a naïve Bayesian classifier are as follows:
1, according to the training sample to calculate the probability of each category P (Ai),
2. The conditional probability P (bi|) of all divisions is computed for each characteristic attribute. Ai),
3. Calculate p for each category (b| AI) *p (AI),
4, select 3 steps in the value of the largest item as the Class B AK.
In the actual coding, and does not calculate each probability, but to build the various properties in each category in the frequency of occurrence, according to the target characteristics of the corresponding probability, the advantage is easy to store and read, easy to use, the specific code is as follows:
Def bayesian (inx,transet,labels): " Bayesian classifier :p aram transet: Feature matrix :p aram labels: category :return: ' labelsTree = {} m,n = tranSet.shape labelsCount = {} Xcount = zeros ((n,1)) for i in arange (m): if labels[i] not in labelsTree: labelsTree[labels[i]] = {} labelsCount[labels[i]] = {} for j in arange (n): if j not in labelstree[labels[i]]: labelsTree[labels[i]][j] = {} #labelsTree [labels[i]][transet[i][j]] = labelstree[labels[i]][ Transet[i][j]].get (labels[i][transet[i][j]],0) + 1 labelstree[labels[i]][j][transet[i,j]] = labelstree[labels[i]][j].get ( transet[i,j],0) + 1 Labelscount[labels[i]][j] = labelscount[labels[i]].get (j,0) + 1 if inX[j] == tranSet[i,j]:              XCOUNT[J] = XCOUNT[J] + 1 pvector = {} xprop = (Xcount/sum (Xcount)). Cumprod () [-1] for key in labelstree.keys (): for i in arange (n): pvector [Key] = pvector.get (key,1) * labelstree[key][i].get (inx[i],1)/labelsCount[key].get (i,1) pvector[key] = pvector[key] * sum ( Array ([X for x in labelscount[key].values ()]))/m return Pvector,array ([X for x in pvector.values ()],dtype = ' float ')/xProp
The
test code is as follows:
from numpy import *import mldata = [[' <=30 ', ' High ', ' no ', ' fair '], [' <=30 ', ' High ', ' no ', ' excellent '], [' 31...40 ', ' High ', ' no ', ' fair '], [' >40 ', ' Medium ', ' No ', ' fair '], [' >40 ', ' low ', ' yes ', ' fair '], [' >40 ', ' low ', ' yes ', ' excellent '], [' 31...40 ', ' low ', ' yes ', ' excellent '], [' <=30 ', ' Medium ', ' No ', ' fair '], [' <=30 ', ' low ', ' yes ', ' fair '], [' >40 ', ' Medium ', ' yes ', ' fair '], [' <=30 ', ' Medium ', ' yes ', ' excellent '], [' 31...40 ', ' Medium ', ' No ', ' excellent '], [' 31...40 ', ' high ', ' yes ', ' fair '], [' >40 ', ' Medium ', ' No ', ' excellent ']]label = [' no ', ' no ', ' yes ', ' yes ', ' yes ', ' no ', ' Yes ', ' no ', ' yes ', ' yes ', ' yes ', ' yes ', ' yes ', ' no ']inx = [' <=30 ', ' Medium ', ' yes ', ' fair ']pv = Ml.bayesian (Array (InX), array (data), array (label)) print (PV)
This article from "Go one stop two look back three" blog, please make sure to keep this source http://janwool.blog.51cto.com/5694960/1895088
Classification algorithm--naive Bayesian classification