Top 10 classic algorithms for data mining (9) Naive Bayes classifier Naive Bayes

Source: Internet
Author: User

 

Bayesian Classifier

The Bayesian classifier classification principle is to calculate the posterior probability of an object based on the prior probability of the object, that is, the probability that the object belongs to a certain class, select a class with the highest posterior probability as the class to which the object belongs. There are four main types of Bayesian Classifiers currently studied: Naive Bayes, tan, ban and GBN.
Bayesian Network is a directed acyclic graph with staring probability. each node in the graph represents a random variable. If an arc exists between the two nodes, it indicates that the random variables corresponding to the two nodes are probability-dependent. Otherwise, the two random variables are conditional independent. In the network, a random node X has a corresponding conditional probability table (CPT), which is used to represent the conditional probability of node X when its parent node obtains each possible value. If node X has no parent node, the CPT of X is its prior probability distribution. The structure of the Bayesian Network and the CPT of each node define the probability distribution of each variable in the network.
Bayesian classifier is a Bayesian network used for classification. The network should include class node C, where the value of C comes from the class set (C1, C2 ,..., cm), including a group of nodes x = (x1, x2 ,..., XN), indicating the features used for classification. For Bayesian network classifier, if a sample D to be classified has the feature value x = (x1, x2 ,..., x n), then the probability P (C = CI | X1 = x1, x2 = x 2 ,..., xn = x n), (I = 1, 2 ,..., m) the following formula should be met:
P (C = CI | x = x) = max {P (C = C1 | x = x), P (C = c2 | x = x ),..., P (C = cm | x = x )}
Bayesian formula:
P (C = CI | x = x) = p (x = x | C = CI) * P (C = CI)/P (x = X)
P (C = CI) can be obtained from the experience of field experts, while p (x = x | C = CI) and P (x = x) are difficult to calculate.
Bayesian network classifier is used for classification in two stages. The first stage is Bayesian Network Classifier learning, that is, constructing a classifier from the sample data, including structure learning and CPT learning. The second stage is Bayesian Network Classifier reasoning, calculate the conditional probability of a class node and classify the classified data. The time complexity of these two phases depends on the degree of dependency between feature values, and even the NP full problem. Therefore, in practical application, Bayesian network classifier is often required to be simplified. Based on the degree of association between feature values, various Bayesian Classifiers can be obtained. naivebayes, tan, ban, and GBN are typical and in-depth Bayesian Classifiers.

 

Naive Bayes

Classification refers to the process of dividing an unknown sample into several pre-known classes. The solution to the data classification problem is a two-step process: the first step is to create a model to describe the advance dataset or concept set. The model is constructed by analyzing samples (or instances, objects, etc.) described by attribute descriptions. Assume that each sample has a pre-defined class, which is determined by the attribute of a class label. The data tuples analyzed to create a model form a training dataset. This step is also called Guided Learning.
Among the many classification models, the two most widely used classification models are decision tree model andNaive Bayes model(Naivebayesianmodel, NBC ). The decision tree model solves the classification problem by constructing a tree. First, a training dataset is used to construct a decision tree. Once the tree is established, it can generate a classification for unknown samples. Decision tree models have many advantages in classification problems, making decision trees easy to use and efficient. Rules can be easily constructed based on decision trees, which are usually easy to interpret and understand; decision trees can be well extended to large databases. At the same time, their size is independent from the size of the database. Another advantage of the decision tree model is that it can construct decision trees for datasets with many attributes. Decision tree models also have some disadvantages, such as difficulties in processing missing data, the emergence of over-fitting problems, and the absence of Relevance between attributes in a dataset.
Compared with the decision tree model,Naive Bayes modelOriginated from classical mathematical theory, it has a solid mathematical foundation and stable classification efficiency. At the same time, the NBC model requires a very small number of metrics, which are not sensitive to missing data and the algorithm is simpler than the latency. Theoretically, the NBC model has the minimum error rate compared with other classification methods. In fact, this is not always the case. This is because if the attributes of the NBC model are independent of each other, this is often not true in actual applications, this affects the correct classification of the NBC model. The classification efficiency of the NBC model is lower than that of the decision tree model when the number of attributes is greater than that of the limit model or when the correlation between attributes is greater. However, when the attribute correlation is small, the performance of the NBC model is the best.
Naive Bayes model:
----
Vmap = Arg Max P (vj | A1, a2...)
VJ belongs to the V set
In this example, vmap is the most likely target value given by example.
A1.. An is the attribute in this example.
In this case, the vmap target value is the one with the highest probability calculated later. Therefore, it is expressed by Max.
----
Bayesian formula applied to P (vj | A1, a2....
We can obtain vmap = Arg Max P (A1, a2... an | VJ) P (VJ)/P (A1, a2...)
The naive Bayes classifier defaults to A1. .. An they are independent of each other.
Therefore, P (A1, a2... an) is not useful for the result. [because all probabilities must be smaller than the limit value after the same thing, the final result does not seem to have much impact.]
You can obtain vmap = Arg Max P (A1, a2... an | VJ) P (VJ)
Then
"Naive Bayes classifier is based on a simple assumption that attributes are independent of each other when a target value is specified. In other words. This assumption indicates the target value of a given strength. We can see that the probability of the joint A1, a2... an is exactly the product of the probability of each individual attribute: P (A1, a2... an | VJ) =BytesI p (ai | VJ)
....
Naive Bayes classifier: vnb = Arg Max P (VJ) Π I p (ai | VJ)
"
Vnb = Arg Max P (VJ)
Here, VJ (Yes | no) is used as an example of the corresponding weather.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.