Naive Bayes Classifier

Source: Internet
Author: User
Naive Bayes Classifier

 

I. Bayesian Theorem

The so-called conditional probability refers to the probability of event a in the case of Event B, expressed by P (A | B.

 

 

You can find

Likewise,

So,

That is

 

Where,

P (A) is called "prior probability". Before the occurrence of Event B, we determine the probability of event;

P (A | B) is called the "posterior probability" (posterior probability), that is, after the occurrence of Event B, we re-evaluate the probability of event;

P (B | A)/P (B) is called the "likelyhood" (likelyhood). This is an adjustment factor that makes the estimated probability closer to the actual probability;

Therefore, the conditional probability can be understood as the following formula:

Posterior Probability = prior probability x Adjustment Factor

This is the meaning of Bayesian inference. We first estimate a "anterior probability" and then add the experiment results to see whether the experiment has enhanced or weakened the "anterior probability" to obtain a "posterior probability" that is closer to the facts ".

Here, if the "probability function" P (B | A)/P (B)> 1, it means that the "prior probability" is enhanced, and the probability of event a is increased; if "probability function" = 1, it means that event B does not help to determine the possibility of event a. If "probability function" <1, it means that "prior probability" is weakened, and the possibility of event a is reduced.

 

Ii. Naive Bayes classifier Principle

Assume that an individual has n features (feature): F1, F2,..., and FN. The existing M categories are C1, C2,..., and CM. Bayesian classifier is the classification that calculates the highest probability, that is, the maximum value of the formula below:

P (c | f1f2... FN)
= P (f1f2... FN | C) P (C)/P (f1f2... FN)

Since P (f1f2... FN) is the same for all classes and can be omitted, the problem becomes a problem.

P (f1f2... FN | C) P (c)

.

Naive Bayes classifier goes further. Assuming that all features are independent of each other

P (f1f2... FN | C) P (c)
= P (F1 | C) P (F2 | C)... P (FN | C) P (c)

Each item on the right of the equal sign can be obtained from the statistical data, so that the probability of each category can be calculated to find the class with the highest probability.

Although the assumption that "all features are independent from each other" is unlikely to be true in reality, it can greatly simplify the computation, and studies have shown that it has little impact on the accuracy of classification results.

 

Iii. Application

This example is taken from Zhang Yang's "algorithm grocery store-Naive Bayes classification of classification algorithms".

According to the sampling statistics of a community website, 10000 of the 89% accounts of the website are real accounts (set as C0) and 11% are false accounts (set as C1 ).

C0 = 0.89

C1 = 0.11

Next, we need to use statistics to determine the authenticity of an account. Assume that an account has the following three features:

F1: log count/registration days
F2: number of friends/days of registration
F3: whether to use a real Avatar (the actual avatar is 1 and the non-real avatar is 0)

F1 = 1, 0.1
F2 = 1, 0.2
F3 = 0

Is this account a real account or a false account?

The method is to use a naive Bayes classifier to calculate the value of the formula below.

P (F1 | C) P (F2 | C) P (F3 | C) P (c)

Although the above values can be obtained from statistics, there is a problem: F1 and F2 are continuous variables and it is not suitable to calculate the probability according to a specific value.

One technique is to convert a continuous value into a discrete value and calculate the probability of a range. For example, F1 is divided into three intervals: [0, 0.05], (0.05, 0.2), and [0.2, + ∞], and then the probability of each interval is calculated. In our example, F1 equals 0.1 and falls in the second interval. Therefore, the probability of occurrence of the second interval is used during calculation.

Based on the statistical data, you can obtain:

P (F1 | C0) = 0.5, P (F1 | C1) = 0.1
P (F2 | C0) = 0.7, P (F2 | C1) = 0.2
P (F3 | C0) = 0.2, P (F3 | C1) = 0.9

Therefore,

P (F1 | C0) P (F2 | C0) P (F3 | C0) P (C0)
= 0.5x0.7x0.2x0.89
= 0.0623

P (F1 | C1) P (F2 | C1) P (F3 | C1) P (C1)
= 0.1x0.2x0.9x0.11
= 0.00198

As you can see, although this user does not use a real Avatar, it is more than 30 times more likely to be a real account than a false account. Therefore, it is determined that this account is true.

 

Another example is taken from Wikipedia, another method for processing continuous variables.

The following is a set of statistics on human physical characteristics.

Gender height (feet) Weight (lbs) feet (INCHES)

Male 6, 180, 12
Male 5.92 190 11
Male 5.58 170 12
Male 5.92 165 10
Female 5 100 6
Female 5.5, 150, 8
Female 5.42 130 7
Female 5.75 150 9

It is known that a person is 6 feet tall, weighs 130, and has 8 inch feet. Is this person a male or a female?

Calculate the values of the following formula based on the naive Bayes classifier.

P (height | gender) x P (weight | gender) x P (footfall | gender) x P (gender)

The difficulty here is that the discrete variable method cannot be used to calculate the probability because height, weight, and forefoot are continuous variables. In addition, because there are too few samples, it is impossible to divide them into intervals for calculation. What should I do?

In this case, we can assume that the height, weight, and forefoot of men and women are normally distributed, and calculate the mean value and variance through the sample, that is, obtain the density function of the normal distribution. With the density function, you can substitute the value to calculate the value of the density function at a certain point.

For example, male height is a normal distribution of mean 5.855 and variance 0.035. Therefore, the relative value of the probability of a male height of 6 feet is equal to 1.5789 (greater than 1 does not matter, because here is the density function value, only used to reflect the relative possibility of each value ).

With this data, you can perform computational classification.

P (Height = 6 | male) x P (Weight = 130 | male) x P (foot = 8 | male) x P (male)
= 6.1984 x E-9

P (Height = 6 | female) x P (Weight = 130 | female) x P (foot = 8 | female) x P (Female)
= 5.3778 x E-4

We can see that the probability of a female is nearly 10000 times higher than that of a male.

 

 

Reference:

Http://www.ruanyifeng.com/blog/2011/08/bayesian_inference_part_one.html

Http://www.ruanyifeng.com/blog/2013/12/naive_bayes_classifier.html

 

Naive Bayes Classifier

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.