Statistical study notes (4)--naive Bayesian method

Source: Internet
Author: User

Naive Bayesian method is a classification method based on Bayesian theorem and independent hypothesis of characteristic condition. In simple terms, the naive Bayesian classifier assumes that each characteristic of a sample is unrelated to other characteristics. For example, if a fruit has a red, round, or roughly 4-inch diameter, the fruit can be judged to be an apple. Although these characteristics are interdependent or some characteristics are determined by other characteristics, the naive Bayesian classifier considers these properties to be independent of the probability distribution of whether the fruit is an apple. Although it is with these naïve ideas and simplistic assumptions, naive Bayesian classifier can still achieve quite good results in many complex reality situations. One of the advantages of naive Bayesian classifier is that it only needs to estimate the necessary parameters based on a small amount of training data (the discrete variable is a priori probability and a class conditional probability, the continuous type variable is the mean and variance of the variable). 1. Bayesian Classification Model

The Bayesian classification model is as follows:


Where X is the property set, Y represents the class variable, p (Y) is a priori probability, p (x| Y) is the class conditional probability, P (X) is evidence, p (y| X) is a posteriori probability. Bayesian classification model is a priori probability P (Y), class conditional probability p (x| Y) and Evidence P (X) to indicate a posteriori probability. When comparing the posterior probabilities of Y, the evidence in the denominator p (X) is always constant and therefore negligible. A priori probability P (Y) can be easily estimated by calculating the percentage of training records that belong to each class in the training set. To the class condition probability P (x| Y) estimates, different implementations determine different Bayesian classification methods, common with naive Bayesian taxonomy and Bayesian belief networks.
2. Naive Bayesian classification Model



3. Example

The data set is as follows:


The prior probabilities computed from the data set, and the class conditional probabilities of each discrete attribute, the parameters of the class conditional probability distribution of the continuous attribute (sample mean and variance) are as follows:

Prior probability: P (Yes) =0.3;p (No) =0.7

P (with room = yes | No) = 3/7

P (with room = no | No) = 4/7

P (with room = yes | Yes) = 0

P (with room = no | Yes) = 1

P (Marital status = Single | No) = 2/7

P (Marital status = Divorce | No) = 1/7

P (Marital status = married | No) = 4/7

P (Marital status = Single | Yes) = 2/3

P (Marital status = Divorce | Yes) = 1/3

P (Marital status = married | Yes) = 0

Annual income:

If class =no: sample mean = 110; sample Variance =2975

If class =yes: sample mean = 90; sample Variance =25

--"Waiting to be predicted record: x={= no, marital status = married, annual income =120k}

P (NO) *p (with room = no | No) *p (Marital status = married | No) *p (annual income =120k| No) =0.7*4/7*4/7*0.0072=0.0024

P (yes) *p (with room = no | Yes) *p (Marital status = married | Yes) *p (annual income =120k| Yes) =0.3*1*0*1.2*10-9=0

Since 0.0024 is greater than 0, the record is categorized as No. As can be seen from the above example, if the class conditional probability of an attribute equals 0, the posterior probability of the entire class is equal to 0. The method of estimating class conditional probabilities using only a record scale is too fragile, especially if the training sample is few and the number of attributes is many. The solution to this problem is to use the M estimation method to estimate the conditional probabilities:




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.