Naive Bayesian classification algorithm

Last Update:2017-06-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

one of the previous exams was the manual calculation of naive Bayesian classification. At that time did not answer correctly, later understood, soon forgot again almost. So write an example and take a look at it here.
first deduce the Bayesian formula:
Assuming that we observe that two events have occurred, and remember to do P (AB), then we can either think of the occurrence of event A, on the basis of which the event B, it can be considered that the first event B, on this basis, the occurrence of event a. So the probability that these two events occur can be remembered as P (AB) =p (a| b) *p (b) and P (BA) =p (b| A) *p (a), where P (a| B), P (b| A) is a conditional probability, meaning that in the condition of the B event, the probability of a and the probability of B occurring in the condition of a event, then obviously P (AB) = P (BA), so the combination of these two formulas becomes P (a| b) *p (b) =p (b| A) *p (a), re-deform to get P (a| B) =p (b| A) *p (a)/P (B) or P (b| A) = P (a| b) *p (b)/P (A). To prevent the formula from looking at preoccupied, just take one example.

P (b| A) =p (a| b) *p (b)/P (a), observing both sides of the equation, regardless of P (A) and P (b), and P (b| A), one side is P (a| B) means that if we know P (A) and P (b), then we can calculate the two conditional probabilities. This is the theoretical basis of naive Bayesian algorithms-the probability of predicting new data based on the probability of existing data, for example, in the table:

	Observation items	Categories (assuming only two classes)
Existing Data 1	000	C1	Known P (observation \| category)--can calculate the probability of an observed item under various conditions
Existing Data 2	001	C1
Existing Data 3	010	C1
Existing Data 4	011	C2
Existing Data 5	100	C1
Existing Data 6	101	C1
Existing Data 7	110	C2
... ...	... ...	... ...
New data	111	？	Calculate p (Category \| observation)--the probability of various types under an observed term, and select the maximum probability that category as the "prediction" result

And then back to the formula, are P (A) and P (B) knowable? Of course it is known, because the amount of data we have is finite, the probability of finding an observation in this quantity or the probability of a certain class is a simple division. And do not need to calculate P (A), we just want to compare which class probability is greater, the denominator is the same, only the numerator, see which big on it.
examples of 7 known data from the above table are: P (C1) =5/7;p (C2) =2/7;
P (000| C1) =1/5, P (001| C1) =1/5, P (010| C1) =1/5, P (011| C2) =1/2, P (100| C1) =1/5, P (101| C1) =1/5, P (110| C2) =1/2, these can all be counted out. Ideally we would calculate the probability of P (c1|111) and P (c2|111) separately and compare the two sizes. P (c1|111) = P (111| C1) *p (C1)/P (111); p (c2|111) = P (111| C2) *p (C2)/P (111), with the same denominator and no pipe, only calculates and compares P (111| C1) *p (C1) and P (111| C2) *p (C2). P (C1), P (C2) have, however P (111| C1) and P (111| C2) None, equals 0. Note that this is a "pure" new data that has never been seen before. This is a very common situation, but 0 times any number is 0,0 and 0 how to compare size. So in this case, a method called Laplace smoothing is used to assume that there is at least one such data under each category in the known data. It looks a bit rough, but when the amount of data is large, it almost doesn't affect the correctness of the result. In this way, P (c1|111) =p (111| C1) *p (C1) =1/6*5/7=0.12, P (c2|111) =p (111| C2) *p (C2) =1/3*2/7=0.09,0.12>0.09 to infer that 111 belongs to the C1 class. I actually changed the data several times because the amount of data was too small, even though +1 would have an impact on the results. This complete new data 111 is predicted as the C1 class, then logically speaking? Can only say a little, after all, by observing we found that most (5/7) of the data are C1 class, then a no "sign" of the new data we can only go to the most likely category.
in practice, our observations are often not one, that is to say, the table is mostly like this:

	Observation Item 1	Observation Item 2	Observation Item 3	Category
Existing Data 1	000	0	M	C1	Known P (observation \| category)--can calculate the probability of an observed item under various conditions
Existing Data 2	001	-	A	C1
Existing Data 3	010	0	A	C1
Existing Data 4	011	0	M	C2
Existing Data 5	100	0	A	C1
Existing Data 6	101	0	A	C1
Existing Data 7	110	-	M	C2
... ...	... ...			... ...
New data	111	0	A	？	Calculate p (Category \| observation)--the probability of various types under an observed term, and select the maximum probability that category as the "prediction" result

P ( c1|1110a) and P (c2|1110a) are required to be used in the 1110a| C1) and P (1110a| C2), we need a precondition for applying naive Bayesian algorithms--the observations are independent and unrelated to each other. Because if you are independent, you can split the probability formula into this: P (1110a| C1) =p (111| C1) *p (0| C1) *p (a| C1), it will be easier to calculate and less likely to be equal to 0.
p (c1|1110a) = P (1110a| C1) *p (C1) = P (111| C1) *p (0| C1) *p (a| C1) *p (C1) =1/6*4/5*4/5*5/7=0.076
p (c2|1110a) = P (1110a| C2) *p (C2) = P (111| C2) *p (0| C2) *p (a| C2) *p (C2) =1/3*1/2*1/3*2/7=0.016
C1 analogy C2 class probability is mostly, natural prediction is C1 class. It can also be felt by the naked eye: Although 111 did not appear in observation 1, but most of the 0 and a majority of a are in the C1 class, the possibility of this new data belonging to the C1 class is also very large. This is the case where three observations are independent of each other and work together on the results.
What is the actual category? Then no one will know. In fact, when I was designing the first table, I expected the rules to be: only the number connected to two + 1 is the C2 class, otherwise it is the C1 class. The visible algorithm is difficult to guess the rules, especially when the amount of training data is low. We can only try several algorithms on the existing data, experiment with some parameters, evaluate the algorithm with the highest correct rate, and then apply it to the future data.

This article from "Empty" blog, declined reprint!

Naive Bayesian classification algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Naive Bayesian classification algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Naive Bayesian classification algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support