1. Overview
Naive Bayesian classification is a Bayesian classifier, Bayesian classification algorithm is a statistical classification method, using probability statistical knowledge classification, the classification principle is to use the Bayesian formula based on the prior probability of an object to calculate the posteriori probability (that the object belongs to a certain class of probability), Then select the class that has the maximum posteriori probability as the class to which the object belongs. In general: When the number of sample features or the correlation between features is large, naive Bayesian classification efficiency is inferior to that of decision tree, and the performance of naive Bayesian classification is best when the correlation between features is small. In addition, the computational process of naive Bayes, such as conditional probabilities, are independent of each other, so it is especially suitable for distributed computing. This paper describes the statistical principles of naive Bayesian classification, and realizes the algorithm in text classification. Naive Bayesian classifier has two kinds of polynomial model (word bag) and Bernoulli model (Word set) for text classification.
2, Naive Bayes basic knowledge
For the random test e There are two random events, A, B, and the probability of a occurring in the condition that the B event occurs is:
where P (AB) is a joint probability of a A, B two event. The upper formula can be deformed by using the multiplication equation:
So we get the Bayesian formula. Bayesian text classification is based on this formula, using a priori probability to get the classification of text.
where P (Ci) is the probability of the first text category, p (w1,w2...wn| CI) is the probability that the feature vector (W1,W2...WN) appears when the text category is CI, and P (W1,W2...WN) is the probability that the eigenvectors appear. General will assume that the characteristics of the word, the probability of appearing in the text is independent, that is, the word and the word is irrelevant, then the joint probability can be expressed as the product of the form, as follows:
For a specific training set, p (W1) p (W2) in the upper formula ... P (WN) is a fixed constant, so the calculation of the denominator can be omitted at the time of classification calculation, as obtained:
Case Explanation:
Suppose there is now a jar of 7 stones, of which 3 is gray, 4 is black, and if a stone is taken randomly from the jar, the probability of a gray stone is 3/7, the probability of a black stone is 4/7; If the 7 stones are in two barrels, a bucket has 2 blocks of gray, 2 blocks of black, and B has 1 blocks of gray. , 2 pieces of black, here is related to the conditional probability, the assumption is calculated from the B-bucket to the gray stone probability, this probability can be recorded as P (GRAY|BUCKETB), we call it "in the known stone from the condition of B-bucket, the probability of removing the gray stone", then P (Gray|buketa) =2/ 4,p (GRAY|BUKETB) =1/3
The formula for conditional probabilities is as follows:
P (GRAY|BUKETB) =p (gray and Buketb)/P (BUKETB)
Explanation: First, the number of gray stones in B-bucket divided by the total number of stones in two barrels to get P (Gray and Buketb) =1/7, followed by a B-bucket of 3 stones, and the total number of stones is 7, so P (BUKETB) =3/7, so there is P (GRAY|BUKETB) =p ( Gray and Buketb)/P (BUKETB) =1/3
Implementation of naive Bayesian classification--python