1, the definition of the model
Naive Bayes is a splitting method based on Bayesian theorem and independent hypothesis of characteristic condition. First, let's understand the Bayesian theorem and the model to be established. For a given data set
Suppose the output category yi∈{c1, C2, ...., ck}, Naive Bayes learns the joint probability distribution P (x|y) by training the data set. But it is generally difficult to find the joint probability distribution P (x|y), so we can find a priori probability distribution and conditional probability distribution to replace it. The prior probability distribution is as follows
For the solution of the prior probability, it can be considered as the proportion of the class in the overall sample according to the large number theorem. The conditional probability distribution is as follows
Through the solution of the prior probability and the conditional probability, we can learn the joint probability (it is generally believed that the joint probability is proportional to the product of the prior probability and the conditional probability), however, the conditional probability is not good, and if it is solved directly, the number of the parameters is the number of the values of all the characteristics of the multiplication. Therefore, the naïve Bayesian idea is introduced here.
The naïve Bayes method assumes that the above conditional probabilities are independent of each other before each feature. At this point we can do the chain expansion, the expression is as follows
Naive Bayesian method is actually the process of seeking joint probability distribution, and the process of finding the posterior probability (also a conditional probability) through joint probability, such classifier belongs to the generative model. Distinguishing from it is the discriminant generation model, common decision tree, logistic regression, SVM, etc., such models are directly generated results (may be P (Y) or P (y|x)). Knowing the process of calculating the priori probability and conditional probability, we will take a look at how Bayesian theorem calculates the posteriori probability.
The introduction of naïve thinking, assuming that the characteristics of each other independent
This is the basic formula for naive Bayes classification, so our model can be built as
And for the denominator in the right side, the denominator is a class-independent formula, that is, for all CK is the same, and then here we just ask for the maximum probability of the category, so the removal of this item will not affect the result (that is, the expression of the same proportion of amplification or reduction will not affect the maximum solution of the judgment), The final formula can be written
2. Maximum posteriori probability
Let's look at the next 0-1 loss function:
At this time the expected risk function, in the optimization of the model, we aim to minimize the expected loss of
For naive Bayesian model, the expected loss function can be expressed as
This loss function and 0-1 loss is different, can be regarded as the probability of classification to each class multiplied by 0-1 loss function, that is, in K only once L function will take 0, the rest of the 1, at this time we also make the conditional probability of taking 0 o'clock P (ck|x) is the largest, so that the overall expected loss is the smallest. The specific mathematical derivation process is as follows
To minimize the expected risk is to maximize the posteriori probability.
3. Estimation of the parameters of naive Bayes
Using maximum likelihood estimation to solve the prior probability and conditional probability, the maximum likelihood estimation of the prior probability
Maximum likelihood estimation of conditional probabilities
However, it is possible to use a maximum likelihood estimate of a probability value of 0. This will affect the calculation of the posterior probability (since the chain solution, once there is a value of 0, it will result in the entire chain of the solution is 0, that is, the probability of obtaining a condition of 0). So we will use Bayesian estimation, the expression of prior probability
Conditional probability expressions for Bayesian estimation
So the naive Bayesian model only calculates the various parameters on the training set, such as the prior probability, the probabilities of each feature on each category (these are used to calculate the conditional probabilities), and so on, based on the values of these studies to predict
4, Naive Bayesian summary
The advantages of Naive Bayes:
1) Simple Bayesian model classification efficiency and stability
2) The small-scale data set performance is very good, can deal with multi-classification problem, suitable for incremental training, especially when the data set out of memory, we can batch of training
3) Less sensitive to missing data, simple algorithm, often used for text classification
The shortcomings of naive Bayes;
1) In theory, Naive Bayes has the smallest error rate compared to other models, but it is not necessarily, because naive Bayesian introduces the hypothesis that each feature is independent of each other. So when the correlation between the characteristics is strong, naive Bayesian performance is general, but the independence between the characteristics of strong, naive Bayesian performance is very good
2) by Apriori and data to determine the probability of posterior examination to determine the classification, so there is a certain error rate classification decision
3) sensitive to the expression of input data
Machine Learning Algorithms Summary (10)--Naive Bayes