Bayesian Classifier
The Bayesian classifier classification principle is to calculate the posterior probability of an object based on the prior probability of the object, that is, the probability that the object belongs to a certain class, select a class with the highest posterior probability as the class to which the object belongs. There are currently four main types of Bayesian Classifiers studied: Naive Bayes, tan, ban, and GBN.
Bayesian Networks are directed acyclic graphs with probability comments. each node in the graph represents a random variable. The two nodes in the graph
If an arc exists between them, it indicates that the random variables corresponding to the two nodes are probability dependent. Otherwise, it indicates that the two random variables are conditional independent. Any node X in the Network
There is a conditional probability table (CPT) to indicate the node X.
The probability of the condition when the parent node obtains the possible values. If node X has no parent node, the CPT of X is its prior probability distribution. Bayesian network structure and CPT of each node
Defines the probability distribution of each variable in the network.
Bayesian classifier is a Bayesian network used for classification. The network should contain Class node C, where C
The value is from the class set (C1, C2,..., CM). It also contains a group of nodes x = (x1, x2 ,...,
XN), indicating the features used for classification. For Bayesian network classifier, if a sample D to be classified has the feature value x = (x1, x2,..., x n)
, Then the sample D belongs to the class CI probability P (C = CI | X1 = x1, x2 = x 2,..., xn = x n), (I =
1, 2,..., m) should meet the following formula:
P (C = CI | x = x) = max {P (C = C1 | x = x), P (C = c2 | x = x ),..., P (C = cm | x = x )}
Bayesian formula:
P (C = CI | x = x) = p (x = x | C = CI) * P (C = CI)/P (x = X)
P (C = CI) can be obtained by experts in the field, while p (x = x | C = CI) and P (x = x) are difficult to calculate.
Bayesian network classifier is used for classification in two stages. The first stage is Bayesian Network Classifier learning, that is, from the number of samples
Construct a classifier in data, including structure learning and CPT
The second stage is the inference of Bayesian network classifier, that is, to calculate the conditional probability of class nodes and classify classified data. The time complexity of these two phases depends on the degree of dependency between feature values, or even
NP is a complete problem. Therefore, Bayesian Network Classifier must be simplified in practical applications. Based on the assumption that the feature values are correlated to different degrees, various Bayesian Classifiers, naive
Bayes, tan, ban, and GBN are typical and in-depth Bayesian Classifiers.
Naive Bayes
Classification refers to the process of dividing an unknown sample into several pre-known classes. The solution to the data classification problem is a two-step process: the first step is to create a model to describe the advance dataset or concept set. By analyzing attributes
Description sample (or instance, object, etc.) to construct the model. Assume that each sample has a pre-defined class, which is determined by the attribute of a class label. Data tuples analyzed for model creation form training count
Data Set. This step is also called Guided Learning.
Among the many classification models, the two most widely used classification models are decision tree model andNaive Bayes model(Naive
Bayesian
Model, NBC ). The decision tree model solves the classification problem by constructing a tree. First, a training dataset is used to construct a decision tree. Once the tree is established, it can generate a classification for unknown samples. Minute
The decision tree model has many advantages in class problems. It is easy to use and efficient. Rules can be easily constructed based on decision trees, which are usually easy to interpret and understand; the decision tree can be well extended to a large
The size of a database is independent of the size of the database. Another advantage of the decision tree model is that it can construct a decision tree for datasets with many attributes. Decision tree models also have some disadvantages, such as lack of Processing
Difficulties in data, the emergence of over-fitting problems, and the absence of correlations between data set attributes.
Compared with the decision tree model,Naive Bayes modelIt originated from classical mathematical theory and has a solid mathematical foundation.
And stable classification efficiency. At the same time, the NBC model requires few parameters, which are not sensitive to missing data and the algorithm is relatively simple. Theoretically, the NBC model has a minimum error rate compared with other classification methods.
But this is not always the case. This is because the attributes of the NBC model are independent of each other. This assumption is often not true in actual applications, which affects the correct classification of the NBC model. Under
If the number of features is large or the correlation between attributes is large, the classification efficiency of the NBC model is inferior to that of the decision tree model. However, when the attribute correlation is small, the performance of the NBC model is the best.
Naive Bayes model:
----
Vmap = Arg Max P (vj | A1, a2...)
VJ belongs to the V set
Vmap is the most likely target value for a given example.
A1.. An is the attribute in this example.
In this case, the vmap target value is the one with the highest probability calculated later. Therefore, it is expressed by Max.
----
Bayesian formula applied to P (vj | A1, a2....
We can obtain vmap = Arg Max P (A1, a2... an | VJ) P (VJ)/P (A1, a2...)
The naive Bayes classifier defaults to A1.. An they are independent of each other.
Therefore, P (A1, a2... an) is of no use in the results. [because all probabilities must be compared after the same thing, the final results do not seem to have much impact.]
You can obtain vmap = Arg Max P (A1, a2... an | VJ) P (VJ)
Then
"Naive Bayes classifier is based on a simple assumption that attributes are independent of each other when a target value is specified. In other words. This assumption indicates the target value of a given strength. We can see that the probability of the joint A1, a2... an is exactly the product of the probability of each individual attribute: P (A1, a2... an | VJ) =BytesI p (ai | VJ)
....
Naive Bayes classifier: vnb = Arg Max P (VJ) Π I p (ai | VJ)
"
Vnb = Arg Max P (VJ)
Here, VJ (Yes | no) corresponds to the weather example.