Naive Bayesian classification

Last Update:2015-07-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Naive Bayesian classification

Naive Bayesian classification is a very simple classification algorithm, called it naive Bayesian classification is because the idea of this method is really very simple, naive Bayesian's ideological foundation is this: for the given to be classified, the probability of the occurrence of the conditions under this term, which is the largest, think of the classification of the category belongs to. In layman's terms, like this, you see a black man on the street, and I ask you, guess where this guy came from, you're going to guess Africa. Why is it? Because blacks have the highest rates of Africans, of course, they may also be American or Asian, but with no other information available, we will choose the category with the most conditional probabilities, which is the ideological foundation of naive Bayes.

The formal definition of Naive Bayes classification is as follows:

1, set as one to be classified, and each A is a characteristic attribute of x.

2, there is a category collection.

3, calculation.

4, if, then.

So the key now is how to calculate the probability of each condition in the 3rd step. We can do this:

1, find a known classification of the set of items to be categorized, this set is called the training sample set.

2. The statistic gets the conditional probability estimate of each characteristic attribute in each category. That

3, if each characteristic attribute is condition independent, then according to Bayes theorem has the following derivation:

because the denominator is constant for all categories, as long as we can maximize the numerator. And because each characteristic attribute is conditionally independent, there are:

As you can see, the entire naive Bayesian classification is divided into three stages:

The first stage-the preparatory stage, the task of this stage is to make the necessary preparation for naive Bayesian classification, the main work is to determine the characteristic attributes according to the specific situation, and the appropriate division of each feature attribute, and then manually classify a portion of the items to be classified, forming a training sample set. The input for this stage is all data to be classified, and the output is the feature attribute and training sample. This phase is the only stage in the whole naive Bayesian classification that needs to be completed manually, and its quality will have an important influence on the whole process, the quality of classifier is determined by characteristic attribute, characteristic attribute division and Training sample quality to a great extent.

The second stage-the classifier training phase, the task of this stage is to generate the classifier, the main work is to calculate the frequency of each category in the training samples and each feature attribute division of each category of the conditional probability estimates, and the results recorded. The input is the characteristic attribute and the training sample, and the output is the classifier. This stage is a mechanical phase, according to the formula discussed above can be completed automatically by the program.

The third stage-the application phase. The task at this stage is to classify the classification items using classifiers, whose input is the classifier and the item to be categorized, and the output is the mapping between the categories and the category. This stage is also a mechanical phase, completed by the program.

Another issue that needs to be discussed is what happens when P (a|y) =0, which occurs when a feature entry is not present in a category, which results in a greatly reduced classifier quality. To solve this problem, we introduced the Laplace calibration, which is very simple, the idea is to add 1 to the count of all the divisions under no category, so that if the number of training samples is sufficiently large, it will not affect the results and solve the embarrassing situation of the above frequency of 0.

Reference: Algorithm grocer--naive Bayesian classification of classification algorithms (Naive Bayesian classification)

Naive Bayesian classification

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Naive Bayesian classification

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support