The naïve Bayesian algorithm is consistent with the idea of generating learning algorithms in the previous article. It does not need to be like linear regression algorithms to fit a variety of assumptions, only to calculate the probabilities of the various assumptions, and then choose the highest probability of the category of the hypothetical classification. It also adds a Bayesian hypothesis: The attribute value x is independent of each other when given the target value Y. Such a classification algorithm is called naive Bayesian classifier (Naive Bayes classifier).
1. Naive Bayesian algorithm
In the naïve Bayesian algorithm model, the given training set is, can be calculated,. Because Bayes assumes that a joint likelihood probability function can be computed:
The maximum joint likelihood probability function can be obtained:
Then we can make predictions about the new data. The prediction formula is:
If x only takes two kinds of values, p (x|y) obeys the effort distribution. If x takes multiple values, p (x|y) is subject to multiple distributions. When the value of x is continuous, the Y-value interval can be discretized, and then each interval classification is named as a specific value.
2. Laplace smoothing
In a given training set, assume that X has a value of K {1,..., k}, so φi = P (z = i). In the case where Laplace smoothing is not used,
And when a certain characteristic attributexnever appeared in the training set, then
To avoid this situation, we use Laplace smoothing. Therefore, you can get:
3. Comparison of Naive Bayes and Multinomial event model
In the spam classification, first set up a Spam dictionary index, and then determine whether a message is spam by using words in the dictionary index of the word spam. In Naive Bayes, you only need to calculate the probability that each training message text contains a word in the Spam dictionary index to calculate the spam, while the multinomial event The model needs to consider how many times a word in the garbage Dictionary index will appear in each training message text to calculate the probability of a spam message.
For example, an e-mail message is "a nip ..." and the Junk Dictionary index is {a,...., nip,....} (A is the 1th word in the dictionary and Nip is the No. 35000 Word). So for naive Bayes, it can be expressed as the following matrix (the 1th element of the matrix is 1, and the No. 35000 element is also 1)
in the multinomial event model, it is expressed as,. This means that the 1th word of the message is a, and the No. 35000 Word is nip. In this case, if the 3rd word in the message is a, the naive is unchanged, but the representation in the Multinomial event model will be x3=1. This allows you to calculate the number of occurrences of words in each Junk dictionary index in a message. The probability of such evaluation is better than that of naive Bayesian algorithm. Specific comparisons are as follows:
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Stanford "Machine learning" Lesson5 sentiment ——— 2, naive Bayesian algorithm