Naive Bayesian classification algorithm

Source: Internet
Author: User

one of the previous exams was the manual calculation of naive Bayesian classification. At that time did not answer correctly, later understood, soon forgot again almost. So write an example and take a look at it here.
first deduce the Bayesian formula:
Assuming that we observe that two events have occurred, and remember to do P (AB), then we can either think of the occurrence of event A, on the basis of which the event B, it can be considered that the first event B, on this basis, the occurrence of event a. So the probability that these two events occur can be remembered as P (AB) =p (a| b) *p (b) and P (BA) =p (b| A) *p (a), where P (a| B), P (b| A) is a conditional probability, meaning that in the condition of the B event, the probability of a and the probability of B occurring in the condition of a event, then obviously P (AB) = P (BA), so the combination of these two formulas becomes P (a| b) *p (b) =p (b| A) *p (a), re-deform to get P (a| B) =p (b| A) *p (a)/P (B) or P (b| A) = P (a| b) *p (b)/P (A). To prevent the formula from looking at preoccupied, just take one example.

P (b| A) =p (a| b) *p (b)/P (a), observing both sides of the equation, regardless of P (A) and P (b), and P (b| A), one side is P (a| B) means that if we know P (A) and P (b), then we can calculate the two conditional probabilities. This is the theoretical basis of naive Bayesian algorithms-the probability of predicting new data based on the probability of existing data, for example, in the table:




Observation items

Categories (assuming only two classes)


Existing Data 1

000

C1

Known P (observation | category)--can calculate the probability of an observed item under various conditions

Existing Data 2

001

C1

Existing Data 3

010

C1

Existing Data 4

011

C2

Existing Data 5

100

C1

Existing Data 6

101

C1

Existing Data 7

110

C2

... ...

... ...

... ...


New data

111

Calculate p (Category | observation)--the probability of various types under an observed term, and select the maximum probability that category as the "prediction" result

And then back to the formula, are P (A) and P (B) knowable? Of course it is known, because the amount of data we have is finite, the probability of finding an observation in this quantity or the probability of a certain class is a simple division. And do not need to calculate P (A), we just want to compare which class probability is greater, the denominator is the same, only the numerator, see which big on it.
examples of 7 known data from the above table are: P (C1) =5/7;p (C2) =2/7;
P (000| C1) =1/5, P (001| C1) =1/5, P (010| C1) =1/5, P (011| C2) =1/2, P (100| C1) =1/5, P (101| C1) =1/5, P (110| C2) =1/2, these can all be counted out. Ideally we would calculate the probability of P (c1|111) and P (c2|111) separately and compare the two sizes. P (c1|111) = P (111| C1) *p (C1)/P (111); p (c2|111) = P (111| C2) *p (C2)/P (111), with the same denominator and no pipe, only calculates and compares P (111| C1) *p (C1) and P (111| C2) *p (C2). P (C1), P (C2) have, however P (111| C1) and P (111| C2) None, equals 0. Note that this is a "pure" new data that has never been seen before. This is a very common situation, but 0 times any number is 0,0 and 0 how to compare size. So in this case, a method called Laplace smoothing is used to assume that there is at least one such data under each category in the known data. It looks a bit rough, but when the amount of data is large, it almost doesn't affect the correctness of the result. In this way, P (c1|111) =p (111| C1) *p (C1) =1/6*5/7=0.12, P (c2|111) =p (111| C2) *p (C2) =1/3*2/7=0.09,0.12>0.09 to infer that 111 belongs to the C1 class. I actually changed the data several times because the amount of data was too small, even though +1 would have an impact on the results. This complete new data 111 is predicted as the C1 class, then logically speaking? Can only say a little, after all, by observing we found that most (5/7) of the data are C1 class, then a no "sign" of the new data we can only go to the most likely category.
in practice, our observations are often not one, that is to say, the table is mostly like this:




Observation Item 1

Observation Item 2

Observation Item 3

Category


Existing Data 1

000

0

M

C1

Known P (observation | category)--can calculate the probability of an observed item under various conditions

Existing Data 2

001

-

A

C1

Existing Data 3

010

0

A

C1

Existing Data 4

011

0

M

C2

Existing Data 5

100

0

A

C1

Existing Data 6

101

0

A

C1

Existing Data 7

110

-

M

C2

... ...

... ...



... ...


New data

111

0

A

Calculate p (Category | observation)--the probability of various types under an observed term, and select the maximum probability that category as the "prediction" result

P ( c1|1110a) and P (c2|1110a) are required to be used in the 1110a| C1) and P (1110a| C2), we need a precondition for applying naive Bayesian algorithms--the observations are independent and unrelated to each other. Because if you are independent, you can split the probability formula into this: P (1110a| C1) =p (111| C1) *p (0| C1) *p (a| C1), it will be easier to calculate and less likely to be equal to 0.
p (c1|1110a) = P (1110a| C1) *p (C1) = P (111| C1) *p (0| C1) *p (a| C1) *p (C1) =1/6*4/5*4/5*5/7=0.076
p (c2|1110a) = P (1110a| C2) *p (C2) = P (111| C2) *p (0| C2) *p (a| C2) *p (C2) =1/3*1/2*1/3*2/7=0.016
C1 analogy C2 class probability is mostly, natural prediction is C1 class. It can also be felt by the naked eye: Although 111 did not appear in observation 1, but most of the 0 and a majority of a are in the C1 class, the possibility of this new data belonging to the C1 class is also very large. This is the case where three observations are independent of each other and work together on the results.
What is the actual category? Then no one will know. In fact, when I was designing the first table, I expected the rules to be: only the number connected to two + 1 is the C2 class, otherwise it is the C1 class. The visible algorithm is difficult to guess the rules, especially when the amount of training data is low. We can only try several algorithms on the existing data, experiment with some parameters, evaluate the algorithm with the highest correct rate, and then apply it to the future data.


This article from "Empty" blog, declined reprint!

Naive Bayesian classification algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.