Naive Bayesian method is a classification method based on Bayesian theorem and independent hypothesis of feature conditions. Simply put, the naive Bayes classifier assumes that each feature of the sample is irrelevant to any other feature. For example, a fruit can be judged to be an apple if it has features such as red, Circle, and about 4 inches in diameter. Although these characteristics are interdependent or some characteristics are determined by other characteristics, the Naive Bayes classifier considers these attributes to be independent of the probability of determining whether the fruit is Apple. Despite these simple ideas and simplistic assumptions, the naive Bayesian classifier is still able to achieve fairly good results in many complex reality situations. One of the advantages of naive Bayesian classifier is that only the necessary parameters are estimated according to a small amount of training data (discrete variables are priori probabilities and class conditional probabilities, and continuous variables are the mean and variance of variables).
1. Bayesian Classification Model
The Bayesian classification model is as follows:
Where X represents the set of attributes, Y represents the class variable, p (Y) is a priori probability, p (x| Y) is the class conditional probability, P (X) is evidence, p (y| X) is a posteriori probability. Bayesian classification model is a priori probability P (Y), class conditional probability p (x| Y) and the evidence P (X) to indicate the posterior probability. When you compare the posterior probability of Y, the evidence P (X) in the denominator is always constant and therefore negligible. A priori probability P (Y) can be easily estimated by calculating the proportion of training records that belong to each class. On class conditional probability P (x| Y) estimates that different implementations determine different Bayesian classification methods, common with naive Bayesian taxonomy and Bayesian belief networks.
2. Naive Bayesian classification Model
3. Examples
Data sets are as follows:
The priori probabilities obtained from the dataset and the parameters (sample mean and variance) of the class conditional probability distribution for each discrete attribute are as follows:
Prior probability: P (Yes) =0.3;p (No) =0.7
P (with room = yes | No) = 3/7
P (with room = no | No) = 4/7
P (with room = yes | Yes) = 0
P (with room = no | Yes) = 1
P (Marital status = Single | No) = 2/7
P (Marital status = Divorce | No) = 1/7
P (Marital status = married | No) = 4/7
P (Marital status = Single | Yes) = 2/3
P (Marital status = Divorce | Yes) = 1/3
P (Marital status = married | Yes) = 0
Annual income:
If class =no: sample mean = 110; sample Variance =2975
If class =yes: sample mean = 90; sample Variance =25
--"To be predicted record: x={have room = no, marital status = married, annual income =120k}
P (NO) *p (with room = no | No) *p (Marital status = married | No) *p (annual income =120k| No) =0.7*4/7*4/7*0.0072=0.0024
P (Yes) *p (with room = no | Yes) *p (Marital status = married | Yes) *p (Yearly income =120k| Yes) =0.3*1*0*1.2*10-9=0
Because 0.0024 is greater than 0, the record is classified as No. As can be seen from the above example, if the class condition probability of a property equals 0, the posterior probability of the whole class equals 0. The method of estimating a class conditional probability using only the recorded proportions is too fragile, especially if the training sample is small and the number of attributes is large. The solution to this problem is to use the M estimation method to estimate the conditional probability: