Pythonde naive Bayesian algorithm

Last Update:2017-09-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Advantages and disadvantages of algorithms

Pros: Still effective with less data, can handle multiple categories of problems

Cons: Sensitive to the way the input data is prepared

Applicable data type: Nominal type data

Algorithm idea:

Naive Bayesian

For example, we want to determine whether an e-mail message is spam, then we know the distribution of the word in this message, then we also need to know: spam in the presence of some words, you can use the Bayesian theorem obtained.

One hypothesis in naive Bayesian classifier is that each feature is equally important

Bayesian classification is a generic term for a class of classification algorithms, which are based on Bayesian theorem, so collectively referred to as Bayesian classification.

Function

Loaddataset ()

Create a dataset where the dataset is a sentence of broken words that represents a user comment for a forum, and label 1 says it's a curse.

Createvocablist (DataSet)

Find out how many words are in total in these sentences to determine the size of our word vectors

Setofwords2vec (Vocablist, Inputset)

To convert a sentence into a vector based on the word, the Bernoulli model is used to consider whether the word exists

BAGOFWORDS2VECMN (Vocablist, Inputset)

This is another model of turning a sentence into a vector, a polynomial model that takes into account the number of occurrences of a word.

TrainNB0 (Trainmatrix,traincatergory)

Calculate P (i) and P (w[i]| C[1]) and P (w[i]| C[0]), here are two tricks, one is to start the numerator denominator not all initialized to 0 is to prevent one of the probability of 0 leads to the whole 0, and the other is the back multiply with logarithmic prevent because the accuracy problem result for 0

CLASSIFYNB (Vec2classify, P0vec, P1vec, PClass1)

Calculates the probability that the vector belongs to two sets according to the Bayesian formula.

#coding =utf-8from numpy Import *def loaddataset (): postinglist=[[' my ', ' dog ', ' have ', ' flea ', ' problems ', ' help ', ' pleas E '], [' Maybe ', ' not ', ' take ', ' him ', ' to ', ' dog ', ' Park ', ' stupid '], [' my ', ' dalmation ', '                 Is ', ' so ', ' cute ', ' I ', ' love ', ' him '], [' Stop ', ' posting ', ' stupid ', ' worthless ', ' garbage '], [' Mr ', ' licks ', ' ate ', ' my ', ' steak ', ' How ', ' to ', ' stop ', ' him '], [' Quit ', ' buying ', ' worthless ', ' d OG ', ' food ', ' stupid ']] Classvec = [0,1,0,1,0,1] #1 was abusive, 0 not return postinglist,classvec# create a list with all words de F createvocablist (dataSet): Vocabset = set ([]) for document in Dataset:vocabset = Vocabset |    Set (document) return list (Vocabset) def setofwords2vec (Vocablist, inputset): Retvocablist = [0] * Len (vocablist)            For word in Inputset:if word in vocablist:retvocablist[vocablist.index (word)] = 1 Else: print ' word ', word, ' not in Dict '' Return retvocablist# another model def bagofwords2vecmn (Vocablist, inputset): Returnvec = [0]*len (vocablist) for Word In Inputset:if Word in vocablist:returnvec[vocablist.index (word)] + = 1 return returnvecdef TRAINNB 0 (trainmatrix,traincatergory): Numtraindoc = Len (trainmatrix) numwords = Len (trainmatrix[0]) pabusive = SUM (train    catergory)/float (numtraindoc) #防止多个概率的成绩当中的一个为0 p0num = Ones (numwords) p1num = Ones (numwords) P0denom = 2.0            P1denom = 2.0 for I in Range (Numtraindoc): if traincatergory[i] = = 1:p1num +=trainmatrix[i] P1denom + = SUM (Trainmatrix[i]) Else:p0num +=trainmatrix[i] P0denom + = SUM (Trainmatrix[i] ) P1vect = log (p1num/p1denom) #处于精度的考虑, otherwise it is possible to limit to zero P0vect = log (p0num/p0denom) return p0vect,p1vect,pabusive def    CLASSIFYNB (Vec2classify, P0vec, P1vec, pClass1): P1 = SUM (vec2classify * P1vec) + log (pClass1) #element-wise mult P0 = SUM (vec2classify * P0vec) + log (1.0-PCLASS1) if p1 > P0:return 1 else:return 0 def TESTINGNB (): Listop osts,listclasses = Loaddataset () myvocablist = Createvocablist (listoposts) trainmat=[] for Postindoc in ListOPost S:trainmat.append (Setofwords2vec (Myvocablist, postindoc)) P0v,p1v,pab = trainNB0 (Array (trainmat), Array (Listclas SES)) testentry = [' Love ', ' my ', ' dalmation '] Thisdoc = Array (Setofwords2vec (Myvocablist, testentry)) Print Teste Ntry, ' classified as: ', CLASSIFYNB (thisdoc,p0v,p1v,pab) testentry = [' stupid ', ' garbage '] Thisdoc = Array (setOfWords2 Vec (Myvocablist, testentry)) print testentry, ' classified as: ', Classifynb (THISDOC,P0V,P1V,PAB) def main (): TE STINGNB () if __name__ = = ' __main__ ': Main ()

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Pythonde naive Bayesian algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Pythonde naive Bayesian algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support