Classification algorithm--naive Bayesian classification

Source: Internet
Author: User

Bayesian classification is an algorithm using probability and statistic knowledge to classify, and its classification principle is Bayesian theorem. The Bayesian theorem has the following formula:

650) this.width=650; "Src=" Http://s2.51cto.com/wyfs02/M02/8D/50/wKiom1iXH7qzQ3X2AAAI9To-mac657.png-wh_500x0-wm_3 -wmp_4-s_3022789441.png "Title=" Bayes theorem. png "alt=" wkiom1ixh7qzq3x2aaai9to-mac657.png-wh_50 "/>

The Bayesian formula shows that we can get a priori probability P (A), the conditional probability P (b| A) and Evidence P (B) to calculate the posteriori probability.

Naive Bayesian classifier is based on the hypothesis that the various conditions are independent of each other, and the categories of the most posterior probabilities are chosen according to the computed posterior probabilities as the categories of target evidence.

The steps to build a naïve Bayesian classifier are as follows:

1, according to the training sample to calculate the probability of each category P (Ai),

2. The conditional probability P (bi|) of all divisions is computed for each characteristic attribute. Ai),

3. Calculate p for each category (b| AI) *p (AI),

4, select 3 steps in the value of the largest item as the Class B AK.

In the actual coding, and does not calculate each probability, but to build the various properties in each category in the frequency of occurrence, according to the target characteristics of the corresponding probability, the advantage is easy to store and read, easy to use, the specific code is as follows:

Def bayesian (inx,transet,labels):     "     Bayesian classifier      :p aram transet: Feature matrix     :p aram labels:  category      :return:     '     labelsTree = {}     m,n = tranSet.shape    labelsCount = {}     Xcount = zeros ((n,1))     for i in arange (m):         if labels[i] not in labelsTree:             labelsTree[labels[i]] = {}             labelsCount[labels[i]] = {}         for j in arange (n):             if j not in labelstree[labels[i]]:                 labelsTree[labels[i]][j] = {}              #labelsTree [labels[i]][transet[i][j]] = labelstree[labels[i]][ Transet[i][j]].get (labels[i][transet[i][j]],0)  + 1             labelstree[labels[i]][j][transet[i,j]] = labelstree[labels[i]][j].get ( transet[i,j],0)  + 1             Labelscount[labels[i]][j] = labelscount[labels[i]].get (j,0)  + 1             if inX[j] == tranSet[i,j]:                 XCOUNT[J] = XCOUNT[J]  + 1    pvector = {}    xprop =  (Xcount/sum (Xcount)). Cumprod () [-1]     for key in labelstree.keys ():         for  i in arange (n):             pvector [Key] = pvector.get (key,1)  * labelstree[key][i].get (inx[i],1)/labelsCount[key].get (i,1)         pvector[key] = pvector[key] *  sum ( Array ([X for x in labelscount[key].values ()]))/m    return  Pvector,array ([X for x in pvector.values ()],dtype =  ' float ')/xProp

The

  test code is as follows:

from numpy import *import mldata = [[' <=30 ', ' High ', ' no ', ' fair '],         [' <=30 ', ' High ', ' no ', ' excellent '],         [' 31...40 ', ' High ', ' no ', ' fair '],        [' >40 ', ' Medium ', ' No ', ' fair '],        [' >40 ', ' low ', ' yes ', ' fair '],         [' >40 ', ' low ', ' yes ', ' excellent '],         [' 31...40 ', ' low ', ' yes ', ' excellent '],        [' <=30 ', ' Medium ', ' No ', ' fair '],        [' <=30 ', ' low ', ' yes ', ' fair '],         [' >40 ', ' Medium ', ' yes ', ' fair '],         [' <=30 ', ' Medium ', ' yes ', ' excellent '],        [' 31...40 ', ' Medium ', ' No ', ' excellent '],        [' 31...40 ', ' high ', ' yes ', ' fair '],         [' >40 ', ' Medium ', ' No ', ' excellent ']]label = [' no ', ' no ', ' yes ', ' yes ', ' yes ', ' no ', ' Yes ', ' no ', ' yes ', ' yes ', ' yes ', ' yes ', ' yes ', ' no ']inx = [' <=30 ', ' Medium ', ' yes ', ' fair ']pv =  Ml.bayesian (Array (InX), array (data), array (label)) print (PV)


This article from "Go one stop two look back three" blog, please make sure to keep this source http://janwool.blog.51cto.com/5694960/1895088

Classification algorithm--naive Bayesian classification

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.