Naive Bayesian method (for continuous and discrete properties processing)

Source: Internet
Author: User
Naive Bayesian method is a classification method based on Bayesian theorem and independent hypothesis of feature conditions. Simply put, the naive Bayes classifier assumes that each feature of the sample is irrelevant to any other feature. For example, a fruit can be judged to be an apple if it has features such as red, Circle, and about 4 inches in diameter. Although these characteristics are interdependent or some characteristics are determined by other characteristics, the Naive Bayes classifier considers these attributes to be independent of the probability of determining whether the fruit is Apple. Despite these simple ideas and simplistic assumptions, the naive Bayesian classifier is still able to achieve fairly good results in many complex reality situations. One of the advantages of naive Bayesian classifier is that only the necessary parameters are estimated according to a small amount of training data (discrete variables are priori probabilities and class conditional probabilities, and continuous variables are the mean and variance of variables). 1. Bayesian Classification Model

The Bayesian classification model is as follows:


Where X represents the set of attributes, Y represents the class variable, p (Y) is a priori probability, p (x| Y) is the class conditional probability, P (X) is evidence, p (y| X) is a posteriori probability. Bayesian classification model is a priori probability P (Y), class conditional probability p (x| Y) and the evidence P (X) to indicate the posterior probability. When you compare the posterior probability of Y, the evidence P (X) in the denominator is always constant and therefore negligible. A priori probability P (Y) can be easily estimated by calculating the proportion of training records that belong to each class. On class conditional probability P (x| Y) estimates that different implementations determine different Bayesian classification methods, common with naive Bayesian taxonomy and Bayesian belief networks.
2. Naive Bayesian classification Model



3. Examples

Data sets are as follows:


The priori probabilities obtained from the dataset and the parameters (sample mean and variance) of the class conditional probability distribution for each discrete attribute are as follows:

Prior probability: P (Yes) =0.3;p (No) =0.7

P (with room = yes | No) = 3/7

P (with room = no | No) = 4/7

P (with room = yes | Yes) = 0

P (with room = no | Yes) = 1

P (Marital status = Single | No) = 2/7

P (Marital status = Divorce | No) = 1/7

P (Marital status = married | No) = 4/7

P (Marital status = Single | Yes) = 2/3

P (Marital status = Divorce | Yes) = 1/3

P (Marital status = married | Yes) = 0

Annual income:

If class =no: sample mean = 110; sample Variance =2975

If class =yes: sample mean = 90; sample Variance =25

--"To be predicted record: x={have room = no, marital status = married, annual income =120k}

P (NO) *p (with room = no | No) *p (Marital status = married | No) *p (annual income =120k| No) =0.7*4/7*4/7*0.0072=0.0024

P (Yes) *p (with room = no | Yes) *p (Marital status = married | Yes) *p (Yearly income =120k| Yes) =0.3*1*0*1.2*10-9=0

Because 0.0024 is greater than 0, the record is classified as No. As can be seen from the above example, if the class condition probability of a property equals 0, the posterior probability of the whole class equals 0. The method of estimating a class conditional probability using only the recorded proportions is too fragile, especially if the training sample is small and the number of attributes is large. The solution to this problem is to use the M estimation method to estimate the conditional probability:


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.