Machine Learning Theory and Practice (3) Naive Bayes

Source: Internet
Author: User

Bayesian decision-making has been controversial. This year marks the 250 anniversary of Bayesian. After the ups and downs, its application is becoming increasingly active. If you are interested, let's take a look at the reflection of Dr. Brad Efron from Stanford, two articles: Bayes Theorem in the 21st century and A250-YEAR
Argument: belief, behavior, and the bootstrap ". Let's take a look at the naive Bayes classifier.

Sometimes we want to know the probability that a sample belongs to each category, that is, P (CI | X), CI indicates category, and X indicates test sample, with probability, we can select the highest probability category. The classical Bayesian formula is required for this probability, as shown in (Formula 1:

(Formula 1)

Each item on the Right of (Formula 1) can be calculated. For example, two buckets in (figure 1) have black and grey balls respectively.

(Figure 1)

Assume that bucket A and bucketb are classes, C1 and C2. When a ball is given, we want to determine the bucket from which it is most likely to come out. In other words, what category is it? This can be calculated based on (Formula 1). Each item on the Right of (Formula 1) can be calculated, for example, P (gray | bucketa) = 2/4, P (gray | bucketb) = 1/3. The stricter calculation method is as follows:

P (gray | bucketb) = P (gray andbucketb)/P (bucketb ),

P (Gray and bucketb) = 1/7, P (bucketb) = 3/7

So P (gray | bucketb) = P (Gray and bucketb)/P (bucketb) = (1/7)/(3/7) = 1/3

This is the principle of Naive Bayes. Based on the posterior probability, the maximum P (CI | X) is selected as the class Ci of X, in addition, Naive Bayes is called naive because it assumes that features are independent, as shown in figure 2:

(Figure 2)

Although this assumption is not strict, it is still very effective in practical application, such as text classification. Let's take a look at the actual situation of text classification and determine whether the chat information is abusive) information (that is, whether the class is abusive or not). Before that, we must emphasize that the feature vectors of Naive Bayes can be multidimensional and the formula above is one-dimensional, as shown in (formula 2), the two-dimensional computation methods are the same:

(Formula 2)

For text classification, the first task is to convert the text into a digital vector, that is, to extract features. Features can be said to be the number of times a keyword appears in an article (bag of words). For example, the words "company" and "reward" often appear in spam, with diverse features, you can set up features as needed. In this example, the token method is used. A tag can be a combination of any character, such as a URL, word, or IP address, of course, judging whether the information is abusive is mostly similar to words. The following code is used to describe:

First, we obtain some training sets:

from numpy import *def loadDataSet():    postingList=[['my', 'dog', 'has', 'flea', 'problems', 'help', 'please'],                 ['maybe', 'not', 'take', 'him', 'to', 'dog', 'park', 'stupid'],                 ['my', 'dalmation', 'is', 'so', 'cute', 'I', 'love', 'him'],                 ['stop', 'posting', 'stupid', 'worthless', 'garbage'],                 ['mr', 'licks', 'ate', 'my', 'steak', 'how', 'to', 'stop', 'him'],                 ['quit', 'buying', 'worthless', 'dog', 'food', 'stupid']]    classVec = [0,1,0,1,0,1]    #1 is abusive, 0 not    return postingList,classVec

The training set is the six sentences extracted from the chat room. Each sentence has a tag of 0 or 1, indicating whether it is an abusive message (abusive or not abusive ). Of course, we can regard each message as a document, but there are more words in the document, but the same is true. Next, let's process the training set and see how many different (unique) words are in the training set. The Code is as follows:

def createVocabList(dataSet):    vocabSet = set([])  #create empty set    for document in dataSet:        vocabSet = vocabSet | set(document) #union of the two sets    return list(vocabSet)

This function returns a vocabulary composed of unique words. The next step is the key step of feature processing, and the code is also posted first:

def setOfWords2Vec(vocabList, inputSet):    returnVec = [0]*len(vocabList)    for word in inputSet:        if word in vocabList:            returnVec[vocabList.index(word)] = 1        else: print "the word: %s is not in my Vocabulary!" % word    return returnVec

This function: Enter the vocabulary and message, index the vocabulary one by one, and check whether there are corresponding words in the message in the vocabulary. If there is one, mark 1, and if there is none, mark 0, in this way, each message is converted into a feature vector of the same length as the vocabulary, which consists of 0 and 1, as shown in Figure 3:

(Figure 3)

With feature vectors, we can train the naive Bayes classifier. In fact, it is used to calculate the three probabilities in the right part (Formula 3) as follows:

(Formula 3)

W is the feature vector.

The Code is as follows:

def trainNB0(trainMatrix,trainCategory):    numTrainDocs = len(trainMatrix)    numWords = len(trainMatrix[0])    pAbusive = sum(trainCategory)/float(numTrainDocs)    p0Num = ones(numWords); p1Num = ones(numWords)      #change to ones()     p0Denom = 2.0; p1Denom = 2.0                        #change to 2.0    for i in range(numTrainDocs):        if trainCategory[i] == 1:            p1Num += trainMatrix[i]            p1Denom += sum(trainMatrix[i])        else:            p0Num += trainMatrix[i]            p0Denom += sum(trainMatrix[i])    p1Vect = log(p1Num/p1Denom)          #change to log()    p0Vect = log(p0Num/p0Denom)          #change to log()    return p0Vect,p1Vect,pAbusive

In the above Code, we enter a matrix composed of feature vectors and a vector composed of tags. pabusive is the class probability P (CI), because there are only two classes, another type can be obtained by using 1-P. Next, initialize the numerator and denominator of P (WI | C1) and P (WI | C0). The only thing that is curious here is why the denominator p0denom and p1denom are both initialized to 2? This is because in practical applications, after we calculate the probability of the right half (Formula 3), that is, P (WI | CI), pay attention to Wi, which indicates a word in the message, the next step is to determine the probability that the entire message belongs to a certain category. We need to calculate the form of P (W0 | 1) P (W1 | 1) P (W2 | 1, in this way, if a WI is 0, the entire probability is 0, or even smaller after a very small concatenation, or even round
Off 0. This will affect the judgment, so they will be transferred to the logarithm space for calculation, the logarithm is often used in machine learning, to avoid ambiguity caused by the numerical operation while keeping monotonous, in addition, the logarithm can be used to convert multiplication to addition to accelerate the operation. Therefore, in the code above, we Initialize all occurrences to 1, and then initialize the denominator to 2, followed by accumulation, starting from 0 or 1 in the logarithm space, the final comparison size will not be affected.

Finally, paste the classification code:

def classifyNB(vec2Classify, p0Vec, p1Vec, pClass1):    p1 = sum(vec2Classify * p1Vec) + log(pClass1)    #element-wise mult    p0 = sum(vec2Classify * p0Vec) + log(1.0 - pClass1)    if p1 > p0:        return 1    else:         return 0

The classification code also calculates the posterior probability in the logarithm space, and then compares the size to determine the type of message.

Summary:

Advantage: It is very effective for a small amount of data and can process multiple types of data

Disadvantage: It depends on data preparation.

Naive Bayes is classified as a discriminant model in the probability graph model)

References:

[1] machine learning in action. Peter Harrington

[2] probabilistic graphical model. Daphne Koller

Reprinted please indicate Source: http://blog.csdn.net/cuoqu/article/details/9262445

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.