Python Implementation of Naive Bayes algorithm and python of Bayesian Algorithm

Last Update:2014-11-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Python Implementation of Naive Bayes algorithm and python of Bayesian Algorithm
Advantages and disadvantages of Naive Bayes Algorithms

Advantage: it is still valid when the data volume is small and can handle multi-category issues
Disadvantage: sensitive to input data preparation methods
Applicable data type: nominal data

Algorithm idea:

Naive Bayes
For example, if we want to determine whether an email is a spam email, we know the word distribution in the email, and we also need to know the number of words in the spam email, the Bayesian theorem can be used to obtain the result.
One assumption in Naive Bayes classifier is that each feature is equally important.

Function

loadDataSet()

Create a dataset. The dataset is a sentence composed of words that have been split. It indicates the user comment of a forum, and tag 1 indicates that this is a curse.

createVocabList(dataSet)

Find the total number of words in these sentences to determine the size of our word Vectors

setOfWords2Vec(vocabList, inputSet)

Convert a sentence into a Vector Based on the word in the sentence. Here, the bernuoli model is used to determine whether the word exists.

bagOfWords2VecMN(vocabList, inputSet)

This is another model for converting sentences into vectors. It is a polynomial model that considers the number of occurrences of a word.

trainNB0(trainMatrix,trainCatergory)

Calculate P (I) and P (w [I] | C [1]) and P (w [I] | C [0]). Here are two tips, one is that the initial denominator is not all initialized to 0 to prevent one of them from being 0, resulting in a total of 0, and the other is to use the logarithm later to prevent the result from precision issues being 0.

classifyNB(vec2Classify, p0Vec, p1Vec, pClass1)

Calculate which of the two sets has a high probability based on Bayesian formula.

1 # coding = UTF-8 2 from numpy import * 3 def loadDataSet (): 4 postingList = [['my', 'Dog', 'has', 'flea ', 'problems', 'help', 'please'], 5 ['maybe', 'not', 'Take ', 'him', 'to', 'Dog ', 'Park ', 'stupid'], 6 ['my', 'dalmation ', 'is', 'so', 'cute ',' I ', 'love ', 'him'], 7 ['stop', 'posting', 'stupid ', 'Worthless', 'garbage'], 8 ['Mr ', 'licks ', 'ate', 'My ', 'steak', 'who', 'to', 'stop', 'him'], 9 ['quit', 'bucket ', 'Worthless ', 'Dog', 'food', 'stupid'] 10 classVec = [,] #1 is abusive, 0 not11 return postingList, classVec12 13 # create a list with all words 14 def createVocabList (dataSet): 15 vocabSet = set ([]) 16 for document in dataSet: 17 vocabSet = vocabSet | set (document) 18 return list (vocabSet) 19 20 def setOfWords2Vec (vocabList, inputSet): 21 retVocabList = [0] * len (vocabList) 22 for word in inputSet: 23 if word in vocabList: 24 retVocabList [vocabList. index (word)] = 125 else: 26 print 'word', word, 'not in dict '27 return retVocabList28 29 # Another model 30 def bagOfWords2VecMN (vocabList, inputSet ): 31 returnVec = [0] * len (vocabList) 32 for word in inputSet: 33 if word in vocabList: 34 returnVec [vocabList. index (word)] + = 135 return returnVec36 37 def trainNB0 (trainMatrix, trainCatergory): 38 numTrainDoc = len (trainMatrix) 39 numWords = len (trainMatrix [0]) 40 pAbusive = sum (trainCatergory)/float (numTrainDoc) 41 # prevent one of multiple probability scores from being 042 p0Num = ones (numWords) 43 p1Num = ones (numWords) 44 p0Denom = 2.045 p1Denom = 2.046 for I in range (numTrainDoc ): 47 if trainCatergory [I] = 1:48 p1Num + = trainMatrix [I] 49 p1Denom + = sum (trainMatrix [I]) 50 else: 51 p0Num + = trainMatrix [I] 52 p0Denom + = sum (trainMatrix [I]) 53 p1Vect = log (p1Num/p1Denom) # accuracy considerations, otherwise, it is very likely that the limit to return to 54 p0Vect = log (p0Num/p0Denom) 55 return p0Vect, p1Vect, pAbusive56 57 def classifyNB (vec2Classify, p0Vec, p1Vec, pClass1 ): 58 p1 = sum (vec2Classify * p1Vec) + log (pClass1) # element-wise mult59 p0 = sum (vec2Classify * p0Vec) + log (1.0-pClass1) 60 if p1> p0: 61 return 162 else: 63 return 064 65 def testingNB (): 66 listOPosts, listClasses = loadDataSet () 67 myVocabList = createVocabList (listOPosts) 68 trainMat = [] 69 for postinDoc in listOPosts: 70 trainMat. append (setOfWords2Vec (myVocabList, postinDoc) 71 p0V, p1V, pAb = trainNB0 (array (trainMat), array (listClasses) 72 testEntry = ['love', 'My ', 'dalmation '] 73 thisDoc = array (setOfWords2Vec (myVocabList, testEntry) 74 print testEntry, 'classified as:', classifyNB (thisDoc, p0V, p1V, pAb) 75 testEntry = ['topid', 'garbage'] 76 thisDoc = array (setOfWords2Vec (myVocabList, testEntry) 77 print testEntry, 'classified as: ', classifyNB (thisDoc, p0V, p1V, pAb) 78 79 80 def main (): 81 testingNB () 82 83 if _ name _ = '_ main _': 84 main ()

From Weizhi note (Wiz)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Implementation of Naive Bayes algorithm and python of Bayesian Algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python Implementation of Naive Bayes algorithm and python of Bayesian Algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support