Naive Bayesian method (for continuous and discrete properties processing)

Last Update:2018-08-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Naive Bayesian method is a classification method based on Bayesian theorem and independent hypothesis of feature conditions. Simply put, the naive Bayes classifier assumes that each feature of the sample is irrelevant to any other feature. For example, a fruit can be judged to be an apple if it has features such as red, Circle, and about 4 inches in diameter. Although these characteristics are interdependent or some characteristics are determined by other characteristics, the Naive Bayes classifier considers these attributes to be independent of the probability of determining whether the fruit is Apple. Despite these simple ideas and simplistic assumptions, the naive Bayesian classifier is still able to achieve fairly good results in many complex reality situations. One of the advantages of naive Bayesian classifier is that only the necessary parameters are estimated according to a small amount of training data (discrete variables are priori probabilities and class conditional probabilities, and continuous variables are the mean and variance of variables). 1. Bayesian Classification Model

The Bayesian classification model is as follows:

Where X represents the set of attributes, Y represents the class variable, p (Y) is a priori probability, p (x| Y) is the class conditional probability, P (X) is evidence, p (y| X) is a posteriori probability. Bayesian classification model is a priori probability P (Y), class conditional probability p (x| Y) and the evidence P (X) to indicate the posterior probability. When you compare the posterior probability of Y, the evidence P (X) in the denominator is always constant and therefore negligible. A priori probability P (Y) can be easily estimated by calculating the proportion of training records that belong to each class. On class conditional probability P (x| Y) estimates that different implementations determine different Bayesian classification methods, common with naive Bayesian taxonomy and Bayesian belief networks.
2. Naive Bayesian classification Model

3. Examples

Data sets are as follows:

The priori probabilities obtained from the dataset and the parameters (sample mean and variance) of the class conditional probability distribution for each discrete attribute are as follows:

Prior probability: P (Yes) =0.3;p (No) =0.7

P (with room = yes | No) = 3/7

P (with room = no | No) = 4/7

P (with room = yes | Yes) = 0

P (with room = no | Yes) = 1

P (Marital status = Single | No) = 2/7

P (Marital status = Divorce | No) = 1/7

P (Marital status = married | No) = 4/7

P (Marital status = Single | Yes) = 2/3

P (Marital status = Divorce | Yes) = 1/3

P (Marital status = married | Yes) = 0

Annual income:

If class =no: sample mean = 110; sample Variance =2975

If class =yes: sample mean = 90; sample Variance =25

--"To be predicted record: x={have room = no, marital status = married, annual income =120k}

P (NO) *p (with room = no | No) *p (Marital status = married | No) *p (annual income =120k| No) =0.7*4/7*4/7*0.0072=0.0024

P (Yes) *p (with room = no | Yes) *p (Marital status = married | Yes) *p (Yearly income =120k| Yes) =0.3*1*0*1.2*10-9=0

Because 0.0024 is greater than 0, the record is classified as No. As can be seen from the above example, if the class condition probability of a property equals 0, the posterior probability of the whole class equals 0. The method of estimating a class conditional probability using only the recorded proportions is too fragile, especially if the training sample is small and the number of attributes is large. The solution to this problem is to use the M estimation method to estimate the conditional probability:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Naive Bayesian method (for continuous and discrete properties processing)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Naive Bayesian method (for continuous and discrete properties processing)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support