Common machine learning algorithms principles + Practice Series 6 (naive Bayesian classification)

Last Update:2016-10-02 Source: Internet

Author: User

Tags natural logarithm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Naive Bayesian NB

Native Bayes is a simple and effective classification algorithm, and Bayes ' law is represented by this conditional probability formula:

P (a| B) = P (b| A) * p (a)/P (B), where P (a| b) means that, in the case of B, the probability of a is occurring, p (A), P (B) represents the probability that a and B occur in reality, which actually depends on the case of our input sample. Bayesian classification algorithms are widely used in many scenarios, such as message classification, text categorization, and so on. For example, in the message classification, it can be so simple to understand, if an e-mail with this word combination (W1,W2,....WN) to indicate, then this message is the probability of spam is how much. is actually the probability of seeking p (spam |w), and according to the formula, p (spam |w) =p (w| spam) *p (junk e-mail)/P (W), we can find the final probability, And the above formula on the right side of the three probabilities can be obtained in the input sample through training (sometimes just to compare the probability of different classifications, so it is not necessary to ask for P (W), it can be thought that the input sample is unchanged, this is unchanged, different classifications will not affect this value).

The following example illustrates the process of the entire Bayesian classification by using Python examples:

1. use vectors to represent a message or a text

Assuming that there are N words in the glossary, we can use 1*n vectors to represent a message content, where each value indicates whether a word appears in the message, 1 represents, and 0 does not appear.

Create a glossary, using a set to represent:

Then use a vector to represent a message, such as [0,1,0,0,1,1 ...]

2. find three probability values during the training phase

3 , using the probability values returned by the training phase to classify

Bayesian assumes that each feature (i.e. each word w1,w2 ...) is independent, that is, p (w| spam) is equivalent to:

P (w1| junk e-mail) *p (w2| spam) * ... Suppose a two classification problem, in the case of W, find P (spam | W), and P (normal mail | W), that value is large, it is considered that classification, and in practical applications, in order to ensure accuracy, the rules are flexible, such as already p>0.99 is considered accurate.

Then our goal is to:

P (junk e-mail | W) > P (normal Mail | W)? is spam: is normal mail

That is, compare the size of the following two values:

P (w| spam) *p (junk e-mail)/p (W)

P (w| normal mail) *p (normal Mail)/p (W)

If you do not consider the denominator and then take the natural logarithm separately, compare the following two value sizes:

ln (p (w| spam) *p (spam)) = ln (p (w| spam)) + ln (junk e-mail) = ln (p (w1| spam)) + ... + ln (p (wn| spam)) + ln (junk e-mail)

ln (p (w| normal mail) *p (normal mail)) = ln (p (w| normal mail)) + ln (normal mail) = ln (p (w1| normal mail)) + ... + ln (p (wn| normal mail)) + ln (normal mail)

Some improvement points:

The above example simply uses a (1) or none (0) to represent the vector, which in fact leads to a problem with accuracy, can be used as the number of occurrences of the word, or even the TD-IDF value of the word.
In the above question we are solving P (c| W), that is, the existence of a word combination is, the message is the probability of classification C, when the word appears more time, will come to the problem of accuracy, you can dissolve the problem into a joint probability, that is, the probability of each word to find P (c| Wi), and then take out the probability of the largest topn to solve, such as n=10,n=15, and so on, the joint probability formula is as follows:

p=p1*p2*p3*....pn/(p1*p2*p3*....pn+ (1-P1) * (1-P2) * (1-P3) ... * (1-PN)), where P1-PN is our chosen topn probability.

Common machine learning algorithms principles + Practice Series 6 (naive Bayesian classification)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More