Naive Bayesian algorithm for data mining---classification algorithm

Last Update:2018-07-26 Source: Internet

Author: User

Tags constant

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Bayesian classification is a statistical classification method, which shows good performance in classification problems. It is obvious that naive Bayes is a Bayesian theorem, and the following is a brief review of Bayesian theorem.

Before we take a look at the calculation of conditional probabilities, the so-called "conditional probability" (Conditional probability) refers to the probability that event a takes place in the case of event B, with P (a| B) to indicate.

You now need to calculate the probability that event a will occur when event B occurs.

With this, we can deform the conditional probability formula to get the following form:

In the Bayesian law, each noun has a conventional name:
P (a) is a priori probability or edge probability. It is called "transcendental" because it does not take into account any B-factor.
P (a| b) is known as b after the condition probability of a, also due to the value of B is called a posteriori probability.
P (b| A) is the conditional probability of B after a given occurrence, and is called a posteriori probability of B because it is taken from a.
P (b) is a priori probability or edge probability of B and also a normalized constant (normalized constant).
According to these terms, the Bayes rule can be expressed as:
posterior probability = (similarity * prior probability)/normalized constant
In other words, the posterior probability is proportional to the product of the prior probability and similarity.
In addition, the ratio p (b| A)/P (B) is also sometimes referred to as the standard similarity (standardised likelihood), the Bayes rule can be expressed as: posterior probability = standard similarity * Prior probability

The Bayesian rule (Bayes ' theorem/bayes Theorem/bayesian Law) has a basic tool called the Bayesian rule, although it is a mathematical formula, but its principle is not required to be a figure. If you see a person always doing something good, that person will probably be a good person. That is to say, when the nature of a thing is not accurately known, it can depend on the probability of how many events appear in relation to the specific nature of the thing to judge its essential attributes. The expression in mathematical language is that the more events that support an attribute, the greater the likelihood that the attribute will be established.

After a few impressions of the Bayes theorem, we will soon ask why there is a naive Bayesian. Naive Bayes classification assumes that the effect of one property value on a given class is independent of other property values. This assumption is called a class condition independent. This hypothesis is clearly intended to facilitate simplification of calculations, and in this sense it is called "plain" (Native, native, simple). The learning process of naive Bayesian classifier is explained below.

The entire naive Bayesian classification is divided into three stages:

The first stage-the preparatory stage, the task of this stage is to make the necessary preparation for naive Bayesian classification, the main work is to determine the characteristic attributes according to the specific situation, and the appropriate division of each feature attribute, and then manually classify a portion of the items to be classified, forming a training sample set. The input for this stage is all data to be classified, and the output is the feature attribute and training sample. This phase is the only stage in the whole naive Bayesian classification that needs to be completed manually, and its quality will have an important influence on the whole process, the quality of classifier is determined by characteristic attribute, characteristic attribute division and Training sample quality to a great extent.

The second stage-the classifier training phase, the task of this stage is to generate the classifier, the main work is to calculate the frequency of each category in the training samples and each feature attribute division of each category of the conditional probability estimates, and the results recorded. The input is the characteristic attribute and the training sample, and the output is the classifier. This stage is a mechanical phase, according to the formula discussed above can be completed automatically by the program.
The third stage-the application phase. The task at this stage is to classify the classification items using classifiers, whose input is the classifier and the item to be categorized, and the output is the mapping between the categories and the category. This stage is also a mechanical phase, completed by the program.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More