Ten classic data Mining algorithms (9) Naive Bayesian classifier Naive Bayes

Source: Internet
Author: User

Bayesian classifier

The Bayes classification principle is a priori probability of an object. The Bayesian posterior probability formula is calculated. In other words, the object belongs to a class of probabilities. Select the class that has the maximum posteriori probability as the generic of the object. Now more research Bayesian classifier, there are four, each: Naive Bayes, TAN, Ban and GBN.


Bayesian network is a direction-free graph with probability gaze. Each node in the graph represents a random variable, and if there is an arc between the two nodes in the graph. It indicates that the corresponding random variables of the two nodes are probabilistic dependent, while the other is that the two random variables are conditional. At random node x in the network there is a corresponding conditional probability table (Conditional probability table,cpt), which represents the conditional probability of node X at the time the parent node takes each possible value. If Node x has no parent node, the CPT of X is its prior probability distribution.

The structure of Bayesian networks and the CPT of each node define the probability distributions of each variable in the network.
Bayesian classifier is a Bayesian network used for classification. The network should include the class node C. The value of C is derived from the class collection (c1, c2, ..., CM), and also includes a set of nodes x = (X1, X2, ..., Xn), representing the characteristics used for classification.

For Bayesian network classifiers. If a sample is to be classified as D, its categorical eigenvalues are x = (x1, x2, ..., x N), then sample D is the probability p (C = ci) belonging to the category CI | X1 = x1, X2 = x 2, ..., Xn = x N), (I =1, 2, ..., m) should satisfy the following formula:
P (C = CI | x = x) = max{P (C = C1 | x = x), P (C = C2 | x = x), ..., P (C = cm | x = x)}
And by the Bayes formula:
P (C = CI | x = x) = P (x = x | c = ci) * p (c = ci)/p (x = x)
, p (C = CI) can be obtained by the experience of the domain expert, and p (x = x | C = ci) and p (x = x) are more difficult to calculate.
The classification of Bayesian network classifier is mainly divided into two stages.

The first stage is the study of Bayesian network classifier. The classifier is constructed from the sample data. Includes structural learning and CPT learning. The second stage is the inference of Bayesian network classifier, which is the conditional probability of computing the node of the class. Classify the classified data. The time complexity of these two stages depends on the degree of dependence between eigenvalues and even the NP total problem, so it is necessary to simplify the Bayesian network classifier in practical application.

Depending on the degree of correlation between the eigenvalues. A variety of Bayesian classifiers can be obtained, Naivebayes, TAN, BAN and GBN are the more typical Bayesian classifiers in the study.

Naive Bayesian

Classification is the process of splitting an unknown sample into several pre-known classes.

Solving a data classification problem is a two-step process: The first step is to create a model that describes the pre-set of datasets or concepts.

By analyzing a sample (or instance) described by an attribute description. objects, etc.) to construct the model.

It is assumed that each sample has a pre-defined class, determined by a property called a class label.

A training data set is formed for the data tuples that are analyzed for modeling, which is also known as guided learning.
In many classification models, the two most widely used classification models are decision tree (decision tree model) and naive Bayesian model (NAIVEBAYESIANMODEL,NBC). The decision tree model solves the classification problem by constructing a tree.

First, the training data set is used to construct a decision tree once the tree is set up. It can produce a classification for unknown samples.

There are many advantages in using decision tree model in classification problem, decision tree is easy to use and efficient, and rules can be constructed very easily according to decision tree. Rules are often easy to interpret and understand. The decision tree can be very well extended to large databases, at the same time its size is independent of the size of the database; another great advantage of a decision tree model is the ability to construct decision trees on datasets with many attributes. The decision tree model also has some drawbacks, such as the difficulty of dealing with missing data, over-fitting the problem. and ignoring the dependencies between attributes in the dataset.
Compared with decision tree model, naive Bayesian model originates from classical mathematics theory. Has a solid mathematical foundation, as well as stable classification efficiency. At the same time. The estimated number of parameters required for the NBC model is very small. Not very sensitive to missing data. The algorithm is also relatively simple.

In theory, the NBC model has the smallest error rate compared to other classification methods. This is not always the case, however, because the NBC model is independent of each other. This is often not true in practical applications, which has a certain impact on the correct classification of the NBC model.

When the number of attributes is more or the correlation between attributes is large. The classification efficiency of NBC model is inferior to that of decision tree model. The performance of the NBC model is best when the attribute correlation is small.
Naive Bayesian Model:
----
Vmap=arg Max P (Vj | a1,a2...an)
VJ belongs to the V collection
The Vmap is the most probable target value given by a example.
The a1...an is the attribute within this example.
In this, the Vmap target value is the one that is the most likely to be calculated later. So with Max.
----
The Bayesian formula is applied to P (Vj | a1,a2...an).
Can get vmap= arg max P (a1,a2...an | VJ) P (VJ)/P (a1,a2...an)
And because naive Bayesian classifier defaults a1...an them to each other independently.
So P (a1,a2...an) is not useful for results. [The final result seems to have little effect because all probabilities have to be compared to the same thing]
Can get vmap= arg max P (a1,a2...an | VJ) P (VJ)
And then
The naive Bayesian classifier is based on a simple hypothesis: when a target value is given, the properties are independent of each other's conditions. Other words. This assumption illustrates the target value of a given strength in the case. The probability of observing the a1,a2...an of a union is exactly the product of the probability of each individual attribute: P (a1,a2...an | VJ) = Π i P (ai| VJ)
....
Naive Bayesian classifier: VNB =arg max P (Vj) πi p (Ai | VJ)
"
VNB = arg max P (VJ)
Here VJ (yes | no), the sample corresponds to the weather.

Ten classic data Mining algorithms (9) Naive Bayesian classifier Naive Bayes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.