Naive Bayesian algorithm for data mining---classification algorithm

Source: Internet
Author: User
Tags constant

Bayesian classification is a statistical classification method, which shows good performance in classification problems. It is obvious that naive Bayes is a Bayesian theorem, and the following is a brief review of Bayesian theorem.

Before we take a look at the calculation of conditional probabilities, the so-called "conditional probability" (Conditional probability) refers to the probability that event a takes place in the case of event B, with P (a| B) to indicate.


You now need to calculate the probability that event a will occur when event B occurs.


With this, we can deform the conditional probability formula to get the following form:


In the Bayesian law, each noun has a conventional name:
P (a) is a priori probability or edge probability. It is called "transcendental" because it does not take into account any B-factor.
P (a| b) is known as b after the condition probability of a, also due to the value of B is called a posteriori probability.
P (b| A) is the conditional probability of B after a given occurrence, and is called a posteriori probability of B because it is taken from a.
P (b) is a priori probability or edge probability of B and also a normalized constant (normalized constant).
According to these terms, the Bayes rule can be expressed as:
posterior probability = (similarity * prior probability)/normalized constant
In other words, the posterior probability is proportional to the product of the prior probability and similarity.
In addition, the ratio p (b| A)/P (B) is also sometimes referred to as the standard similarity (standardised likelihood), the Bayes rule can be expressed as: posterior probability = standard similarity * Prior probability

The Bayesian rule (Bayes ' theorem/bayes Theorem/bayesian Law) has a basic tool called the Bayesian rule, although it is a mathematical formula, but its principle is not required to be a figure. If you see a person always doing something good, that person will probably be a good person. That is to say, when the nature of a thing is not accurately known, it can depend on the probability of how many events appear in relation to the specific nature of the thing to judge its essential attributes. The expression in mathematical language is that the more events that support an attribute, the greater the likelihood that the attribute will be established.

After a few impressions of the Bayes theorem, we will soon ask why there is a naive Bayesian. Naive Bayes classification assumes that the effect of one property value on a given class is independent of other property values. This assumption is called a class condition independent. This hypothesis is clearly intended to facilitate simplification of calculations, and in this sense it is called "plain" (Native, native, simple). The learning process of naive Bayesian classifier is explained below.

The entire naive Bayesian classification is divided into three stages:

The first stage-the preparatory stage, the task of this stage is to make the necessary preparation for naive Bayesian classification, the main work is to determine the characteristic attributes according to the specific situation, and the appropriate division of each feature attribute, and then manually classify a portion of the items to be classified, forming a training sample set. The input for this stage is all data to be classified, and the output is the feature attribute and training sample. This phase is the only stage in the whole naive Bayesian classification that needs to be completed manually, and its quality will have an important influence on the whole process, the quality of classifier is determined by characteristic attribute, characteristic attribute division and Training sample quality to a great extent.

The second stage-the classifier training phase, the task of this stage is to generate the classifier, the main work is to calculate the frequency of each category in the training samples and each feature attribute division of each category of the conditional probability estimates, and the results recorded. The input is the characteristic attribute and the training sample, and the output is the classifier. This stage is a mechanical phase, according to the formula discussed above can be completed automatically by the program.
The third stage-the application phase. The task at this stage is to classify the classification items using classifiers, whose input is the classifier and the item to be categorized, and the output is the mapping between the categories and the category. This stage is also a mechanical phase, completed by the program.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.