Bayesian classifier
The classification principle of Bayesian classifier is based on the prior probability of an object, and the Bayesian formula is used to calculate the posteriori probability, that is, the probability of the object belonging to a certain class, and select the class with the maximum posteriori probability as the class to which the object belongs. At present, there are four kinds of Bayesian classifiers, each of which are: Naive Bayes, TAN, Ban and GBN.
Bayesian network is a direction-free graph with probability gaze, each node in the graph represents a random variable, and if there is an arc between two nodes, the corresponding random variables of the two nodes are probabilistic dependent, whereas the two stochastic variables are conditionally independent. At random node x in the network there is a corresponding conditional probability table (Conditional probability table,cpt), which represents the conditional probability of node X at the time the parent node takes each possible value. If Node x has no parent node, the CPT of X is its prior probability distribution. The structure of Bayesian networks and the CPT of each node define the probability distributions of each variable in the network.
Bayesian classifier is a Bayesian network used for classification. The network should include the class node C, where the value of C from the class collection (c1, c2, ..., CM), also includes a set of nodes x = (X1, X2, ..., Xn), representing the characteristics used for classification. For Bayesian network classifiers, if a sample D is to be classified, its categorical eigenvalues are x = (x1, x2, ..., x N), then sample D is the probability p for category ci (C = CI | X1 = x1, X2 = x 2, ..., Xn = x N), (I =1, 2, ..., m) should satisfy the following formula:
P (C = CI | x = x) = max{P (C = C1 | x = x), P (C = C2 | x = x), ..., P (C = cm | x = x)}
And by the Bayes formula:
P (C = CI | x = x) = P (x = x | c = ci) * p (c = ci)/p (x = x)
, p (C = CI) can be obtained by the experience of the domain expert, and p (x = x | C = ci) and p (x = x) are more difficult to calculate.
The classification of Bayesian network classifier is mainly divided into two stages. The first stage is the study of Bayesian network classifier, that is to construct the classifier from the sample data, including structure learning and CPT learning, and the second stage is the inference of Bayesian network classifier, that is, the conditional probability of computing the class node, and classify the classified data. The time complexity of these two stages depends on the degree of dependence between eigenvalues and even the NP total problem, so it is necessary to simplify the Bayesian network classifier in practical application. According to the different degree of correlation between the eigenvalues, if we can get a variety of Bayesian classifiers, Naivebayes, TAN, BAN, GBN are the more typical Bayesian classifier.
Naive Bayesian
Classification is the process of splitting an unknown sample into several pre-known classes. Solving a data classification problem is a two-step process: The first step is to create a model that describes the pre-set of datasets or concepts. The model is constructed by analyzing the sample (or instance, object, etc.) described by the attribute description. It is assumed that each sample has a pre-defined class, determined by a property called a class label. A training data set is formed for the data tuples that are analyzed for modeling, which is also known as guided learning.
In many classification models, the two most widely used classification models are decision tree models (decision tree model) andnaive Bayesian model(NAIVEBAYESIANMODEL,NBC). The decision tree model solves the classification problem by constructing a tree. First, the training data set is used to construct a decision tree, and once the tree is set up, it can generate a classification for the unknown sample. The decision tree model has many advantages in the classification problem, the decision tree is easy to use and efficient, the rules are easy to construct according to the decision tree, and the rules are often easily interpreted and understood; The decision tree can be very well extended to a large database, at the same time its size is independent of the size of the database Another great advantage of a decision tree model is the ability to construct decision trees on datasets with many attributes. Decision tree models also have some drawbacks, such as the difficulty of dealing with missing data, over-fitting problems, and ignoring the correlations between attributes in a dataset.
Compared with the decision tree model,naive Bayesian modelOriginated from classical mathematics theory, has a solid mathematical foundation, and stable classification efficiency. At the same time, the NBC model required a very small number of expected parameters, less sensitive to missing data, and simpler algorithms. In theory, the NBC model has the smallest error rate compared to other classification methods. But this is not always the case, because if the NBC model is independent of each other, this is often not true in practice, which has a certain effect on the correct classification of the NBC model. The efficiency of the NBC model is inferior to the decision tree model when the number of attributes is more or the correlation between attributes is large. The performance of the NBC model is best when the attribute correlation is small.
Naive Bayesian Model:
----
Vmap=arg Max P (Vj | a1,a2...an)
VJ belongs to the V collection
The Vmap is the most probable target value given by a example.
The a1...an is the attribute within this example.
In this, the Vmap target value is the one that is the most likely to be calculated later. So with Max.
----
The Bayesian formula is applied to P (Vj | a1,a2...an).
Can get vmap= arg max P (a1,a2...an | VJ) P (VJ)/P (a1,a2...an)
And because naive Bayesian classifier defaults a1...an them to each other independently.
So P (a1,a2...an) is not useful for results. [The final result seems to have little effect because all probabilities have to be compared to the same thing]
Can get vmap= arg max P (a1,a2...an | VJ) P (VJ)
And then
The naive Bayesian classifier is based on a simple hypothesis: when a target value is given, the properties are independent of each other's conditions. Other words. This assumption illustrates the target value of a given strength in the case. The probability of observing the a1,a2...an of a union is exactly the product of the probability of each individual attribute: P (a1,a2...an | VJ) =ΠI P (ai| VJ)
....
Naive Bayesian classifier: VNB =arg max P (Vj) πi p (Ai | VJ)
"
VNB = arg max P (VJ)
Here VJ (yes | no), the corresponding weather examples.
Ten classical algorithms for Data Mining (9) Naive Bayesian classifier Naive Bayes