Naive Bayesian classifier

Source: Internet
Author: User

Reference Blog: Http://www.open-open.com/doc/view/2952280b0327489684c0be7e96d2eadd
Http://www.cnblogs.com/leoo2sk/archive/2010/09/17/naive-bayesian-classifier.html
Http://www.ruanyifeng.com/blog/2013/12/naive_bayes_classifier.html

First give some basic knowledge:

The probability that event a takes place under the premise that event B has occurred is called the conditional probability of event a under event B. Its basic solution formula is:.

The Bayes theorem is useful because we often encounter this situation in our lives: we can easily derive P (a| B), P (b| A) is difficult to draw directly, but we are more concerned about P (b| A), Bayesian theorem for us to get through the P (a| B) Get P (b| A) of the road.

The Bayesian theorem is directly given below without proof:

1. The principle of naive Bayesian classifierNaive Bayes Classification is a very simple classification algorithm called Naive Bayes classification because of the idea of this method reallyIs simple, the naïve Bayesian idea is based on the following: for the given classification, the probability of the occurrence of each category under the condition of this term, which is the largest, it is thought that the classification of the category belongs to. In layman's terms, like this, you see a black man on the street, and I ask you, guess where this guy came from, you're going to guess Africa. Why is it? Because blacks have the highest rates of Africans, of course, they may also be American or Asian, but with no other information available, we will choose the category with the most conditional probabilities, which is the ideological foundation of naive Bayes. As an example:

A hospital received six outpatient patients in the morning, such as the following table.

Symptoms of occupational diseases

Sneezing nurse catching a cold
Sneezing farmer Allergy
Headache for construction workers
Headache Construction workers cold
Sneezing teacher catching a cold
Headache Teacher Concussion

Now there's a seventh patient, a sneezing construction worker. What is the probability of his catching a cold?

According to Bayes theorem:

P (a| B) = P (b| A) P (a)/P (B)

Can get

P (Cold | Sneezing x construction workers)
= p (sneezing x construction worker | cold) x P (Cold)
/P (sneezing x construction workers)

It is assumed that the two characteristics of "sneezing" and "construction worker" are independent, so the above equation becomes

P (Cold | Sneezing x construction workers)
= P (Sneezing | cold) x P (construction worker | cold) x P (Cold)
/p (sneezing) x p (construction workers)

This can be calculated.

P (Cold | Sneezing x construction workers)
= 0.66 x 0.33 x 0.5/0.5 x 0.33
= 0.66

As a result, the sneezing construction worker has a 66% chance of catching a cold. In the same vein, you can calculate the likelihood of a patient suffering from allergies or concussions. By comparing these probabilities, you can know what disease he is most likely to have.

This is the basic method of Bayesian classifier: On the basis of statistical data, according to some characteristics, the probability of each category is calculated and the classification is realized.

2. The process of naive Bayesian classifier

The formal definition of Naive Bayes classification is as follows:

1, set as one to be classified, and each A is a characteristic attribute of x. X is the sample, and a is the individual attribute in the sample

2, there is a category collection. C is the label

3, calculation. Calculates the probability of a category under each sample property

4, if, then. Take the maximum probability as the final classification result

So the key now is how to calculate the probability of each condition in the 3rd step. We can do this:

1, find a known classification of the set of items to be categorized, this set is called the training sample set.

2. The statistic gets the conditional probability estimate of each characteristic attribute in each category. That

3, if each characteristic attribute is condition independent, then according to Bayes theorem has the following derivation:

Because the denominator is constant for all categories, as long as we can maximize the numerator. And because each characteristic attribute is conditionally independent, there are:

, M represents the number of attributes, such as 19 characters

Based on the above analysis, the naïve Bayesian classification process can be represented by a representation (for the time being, no validation is considered):


As you can see, the entire naive Bayesian classification is divided into three stages:

The first stage-the preparatory stage, the task of this stage is to make the necessary preparation for naive Bayesian classification, the main work is to determine the characteristic attributes according to the specific situation, and the appropriate division of each feature attribute, and then manually classify a portion of the items to be classified, forming a training sample set. The input for this stage is all data to be classified, and the output is the feature attribute and training sample. This phase is the only stage in the whole naive Bayesian classification that needs to be completed manually, and its quality will have an important influence on the whole process, the quality of classifier is determined by characteristic attribute, characteristic attribute division and Training sample quality to a great extent.

The second stage-the classifier training phase, the task of this stage is to generate the classifier, the main work is to calculate the frequency of each category in the training samples and each feature attribute division of each category of the conditional probability estimates, and the results recorded. The input is the characteristic attribute and the training sample, and the output is the classifier. This stage is a mechanical phase, according to the formula discussed above can be completed automatically by the program.

The third stage-the application phase. The task at this stage is to classify the classification items using classifiers, whose input is the classifier and the item to be categorized, and the output is the mapping between the categories and the category. This stage is also a mechanical phase, completed by the program

3. Conditional probability and Laplace calibration of characteristic attribute division under estimating category

This section discusses the estimates for P (a|y).

As can be seen from the above, the calculation of the conditional probability P (a|y) of each division is the key step of naive Bayesian classification, when the characteristic attribute is discrete value, as long as the very convenient statistical training sample of each division in each category in the frequency of the occurrence to be used to estimate P (a|y), the following focus on the characteristics of continuous values.

When a feature attribute is a continuous value, it is generally assumed that its value is subject to a Gaussian distribution (also known as a normal distribution). That

and

So as long as the average and standard deviation of this feature item are calculated for each category in the training sample, note that this is the mean and standard deviation of the item in each category, and the desired estimate is obtained by substituting the above formula. The calculation of the mean and standard deviation is not mentioned here.

Another issue that needs to be discussed is what happens when P (a|y) =0, which occurs when a feature entry is not present in a category, which results in a greatly reduced classifier quality. To solve this problem, we introduced the Laplace calibration, which is very simple, it is the number of all divisions under the category of 1, such as the previous outpatient example, p (Teacher | cold) = 0, then we are in this one count plus 1 (of course, this is for discrete cases, because of the continuous case, There is no 0 such probability value), so if the training sample set is sufficiently large, it will not affect the results, and solve the above-mentioned frequency of 0 embarrassing situation.

4. Application examples

The following is a set of statistical data on human body characteristics. Requirements: According to height, weight, foot and other characteristics to determine whether it is a male or female

gender height (feet) weight (lbs) foot (inch)

male 6 12 
male 5.92, 11 
Male 5. 12 
Male 5.92 165 10 
Female 5 6 
Female 5.5 150 8  
female 5.42 7 
Female 5.75 9

Is it a man or a woman who is known to be 6 feet tall, 130 pounds and 8 inches in the palm of his foot?

Based on Naive Bayes classifier, the value of the following equation is calculated.

P (Height | gender) x P (Weight | gender) x p (Foot | gender) x P (gender)

The difficulty here is that because height, weight, and feet are continuous variables, the probability cannot be calculated using discrete variables. And because the sample is too small, it can not be divided into interval calculation. What to do?

At this point, it can be assumed that the height, weight and foot of the male and female are normally distributed, and the mean and variance are calculated by the sample, which is the density function of the normal distribution. With the density function, you can put the value into the value of a certain point of the density function.

For example, a male's height is a normal distribution with a mean value of 5.855 and a variance of 0.035. Therefore, the relative value of a man's 6-foot-height probability is equal to 1.5789 (greater than 1 is not related, since this is the value of the density function and is used only to reflect the relative probability of each value).

With this data, the gender classification can be calculated.

P (Height =6| male) x p (weight =130| male) x P (foot Palm =8| male) x p (male)
= 6.1984 x e-9

P (Height =6| female) x p (weight =130| female) x P (foot Palm =8| female) x P (female)
= 5.3778 x e-4

It can be seen that the probability of a woman is nearly 10,000 times times higher than that of a male, so the person is judged to be female.

5. Advantages and disadvantages of naive Bayesian classifierThe biggest advantage of naive Bayesian classifier is his high speed in receiving large amounts of data training and querying. Especially when the amount of training is increasing .(we can train them several times, and some other methods, such as decision trees and support vectors, transfer the entire training data set at once)Another advantage is that the classifier learning situation has a relatively simple explanation, we can simply through the query learning to calculate some probability value to understand its classification principle.
The biggest flaw in naive Bayesian classification is that it cannot handle the changes that are made to the characteristics (i.e., the previously difficult-to-satisfy mutual independence)

Advantages:

The naïve Bayesian model originates from classical mathematics theory, has a solid mathematical foundation and stable classification efficiency.

Second, the NBC model needs to estimate a few parameters, the missing data is not too sensitive, the algorithm is relatively simple.


Disadvantages:

First, theoretically, the NBC model has the smallest error rate compared with other classification methods. However, this is not always the case, because the NBC model assumes that the properties are independent of each other, which is often not true in practical applications (clustering is considered to be the first clustering of attributes with greater correlation), which has a certain effect on the correct classification of the NBC model. When the number of attributes is more or the correlation between attributes is large, the efficiency of the NBC model is inferior to the decision tree model. The performance of the NBC model is best when the attribute correlation is small.

Second, need to know a priori probability.

Iii. the error rate of classification decision






Naive Bayesian classifier

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.