Python Implementation Method of Naive Bayes algorithm, python of Bayesian Algorithm

Source: Internet
Author: User

Python Implementation Method of Naive Bayes algorithm, python of Bayesian Algorithm

This article describes the python Implementation Method of Naive Bayes algorithm. Share it with you for your reference. The specific implementation method is as follows:

Advantages and disadvantages of Naive Bayes Algorithms

Advantage: it is still valid when the data volume is small and can handle multi-category issues

Disadvantage: sensitive to input data preparation methods

Applicable data type: nominal data

Algorithm idea:

For example, if we want to determine whether an email is a spam email, we know the word distribution in the email, and we also need to know the number of words in the spam email, the Bayesian theorem can be used to obtain the result.

One assumption in Naive Bayes classifier is that each feature is equally important.

Function
LoadDataSet ()

Create a dataset. The dataset is a sentence composed of words that have been split. It indicates the user comment of a forum, and tag 1 indicates that this is a curse.

CreateVocabList (dataSet)

Find the total number of words in these sentences to determine the size of our word Vectors

SetOfWords2Vec (vocabList, inputSet)

Convert a sentence into a Vector Based on the word in the sentence. Here, the bernuoli model is used to determine whether the word exists.

BagOfWords2VecMN (vocabList, inputSet)

This is another model for converting sentences into vectors. It is a polynomial model that considers the number of occurrences of a word.

TrainNB0 (trainMatrix, trainCatergory)

Calculate P (I) and P (w [I] | C [1]) and P (w [I] | C [0]). Here are two tips, one is that the initial denominator is not all initialized to 0 to prevent one of them from being 0, resulting in a total of 0, and the other is to use the logarithm later to prevent the result from precision issues being 0.

ClassifyNB (vec2Classify, p0Vec, p1Vec, pClass1)

Calculate which of the two sets has a high probability based on Bayesian formula.
Copy codeThe Code is as follows:
# Coding = UTF-8
From numpy import *
Def loadDataSet ():
PostingList = [['my', 'Dog', 'has ', 'flea', 'problems', 'help', 'please'],
['Maybe', 'not ', 'Take', 'him', 'to', 'Dog', 'Park ', 'stupid'],
['My', 'dalmation ', 'is', 'so', 'cute ',' I ', 'love', 'him'],
['Stop', 'posting', 'stupid ', 'Worthless', 'garbage'],
['Mr ', 'licks', 'ate', 'My ', 'steak', 'who', 'to', 'stop', 'him'],
['Quit', 'bucket', 'Worthless ', 'Dog', 'food', 'stupid']
ClassVec = [0, 1, 0, 1] #1 is abusive, 0 not
Return postingList, classVec

# Create a list with all words
Def createVocabList (dataSet ):
VocabSet = set ([])
For document in dataSet:
VocabSet = vocabSet | set (document)
Return list (vocabSet)

Def setOfWords2Vec (vocabList, inputSet ):
RetVocabList = [0] * len (vocabList)
For word in inputSet:
If word in vocabList:
RetVocabList [vocabList. index (word)] = 1
Else:
Print 'word', word, 'not in dict'
Return retVocabList

# Another Model
Def bagOfWords2VecMN (vocabList, inputSet ):
ReturnVec = [0] * len (vocabList)
For word in inputSet:
If word in vocabList:
ReturnVec [vocabList. index (word)] + = 1
Return returnVec

Def trainNB0 (trainMatrix, trainCatergory ):
NumTrainDoc = len (trainMatrix)
NumWords = len (trainMatrix [0])
PAbusive = sum (trainCatergory)/float (numTrainDoc)
# Prevent one of the scores with multiple probabilities from being 0
P0Num = ones (numWords)
P1Num = ones (numWords)
P0Denom = 2.0
P1Denom = 2.0
For I in range (numTrainDoc ):
If trainCatergory [I] = 1:
P1Num + = trainMatrix [I]
P1Denom + = sum (trainMatrix [I])
Else:
P0Num + = trainMatrix [I]
P0Denom + = sum (trainMatrix [I])
P1Vect = log (p1Num/p1Denom) # It is in consideration of precision. Otherwise, it is likely that the limit is zero.
P0Vect = log (p0Num/p0Denom)
Return p0Vect, p1Vect, pAbusive

Def classifyNB (vec2Classify, p0Vec, p1Vec, pClass1 ):
P1 = sum (vec2Classify * p1Vec) + log (pClass1) # element-wise mult
P0 = sum (vec2Classify * p0Vec) + log (1.0-pClass1)
If p1> p0:
Return 1
Else:
Return 0

Def testingNB ():
ListOPosts, listClasses = loadDataSet ()
MyVocabList = createVocabList (listOPosts)
TrainMat = []
For postinDoc in listOPosts:
TrainMat. append (setOfWords2Vec (myVocabList, postinDoc ))
P0V, p1V, pAb = trainNB0 (array (trainMat), array (listClasses ))
TestEntry = ['love', 'my', 'dalmation ']
ThisDoc = array (setOfWords2Vec (myVocabList, testEntry ))
Print testEntry, 'classified as: ', classifyNB (thisDoc, p0V, p1V, pAb)
TestEntry = ['stupid ', 'garbage']
ThisDoc = array (setOfWords2Vec (myVocabList, testEntry ))
Print testEntry, 'classified as: ', classifyNB (thisDoc, p0V, p1V, pAb)


Def main ():
TestingNB ()

If _ name _ = '_ main __':
Main ()

I hope this article will help you with Python programming.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.