Statistical Study Notes (4) -- Naive Bayes

Source: Internet
Author: User

Naive Bayes is a classification method based on Bayesian theorem and independent hypothesis of feature conditions. Simply put, Naive Bayes classifier assumes that each feature of the sample is irrelevant to other features. For example, if a fruit has the characteristics of red, circle, and about 4 inch in diameter, it can be regarded as an apple. Although these features are mutually dependent or some of them are determined by other features, Naive Bayes classifier determines that these attributes are independent in determining whether the fruit is an apple probability distribution. Despite these simple ideas and oversimplified assumptions, Naive Bayes classifier can still achieve very good results in many complex real-world situations. One advantage of Naive Bayes classifier is that it only needs to estimate necessary parameters based on a small amount of training data (discrete variables are prior probability and class conditional probability, and continuous variables are mean and variance of variables ).

1. Bayesian Classification Model

The Bayesian classification model is as follows:


Where, X indicates the property set, y indicates the class variable, P (y) indicates the prior probability, P (x | Y) indicates the class condition probability, p (x) indicates the evidence, P (Y | X) is the posterior probability. Bayesian classification model uses prior probability P (Y), Class conditional probability p (x | Y), and evidence p (x) To represent posterior probability. When comparing the posterior probability of Y, the evidence p (x) in the denominator is always a constant, so it is negligible. Prior probability P (y) can be easily estimated by calculating the proportion of training records belonging to each category in the training set. For the estimation of the probability p (x | y) of Class conditions, different implementations determine different Bayesian classification methods. Common Bayesian classification methods and Bayesian belief networks are available.

2. Naive Bayes classification model

3. Instance

The dataset is as follows:


The prior probability calculated from the dataset, the class condition probability of each discrete attribute, and the class condition probability distribution parameters of the continuous attribute (Sample Mean and variance) are as follows:

Anterior Probability: P (yes) = 0.3; P (NO) = 0.7

P (having room = Yes | no) = 3/7

P (with room = No | no) = 4/7

P (having room = Yes | yes) = 0

P (with room = No | yes) = 1

P (marital status = single | no) = 2/7

P (marital status = divorce | no) = 1/7

P (marital status = married | no) = 4/7

P (marital status = single | yes) = 2/3

P (marital status = divorce | yes) = 1/3

P (marital status = married | yes) = 0

Annual income:

If class = No: Sample Mean = 110; sample variance = 2975

If class = yes: Sample Mean = 90; sample variance = 25

-- "To be predicted: x = {having a room = No, marital status = married, annual income = 120 k}

P (NO) * P (having a room = No | no) * P (marital status = married | no) * P (annual income = 120k | No) = 0.7*4/7*4/7*0.0072 = 0.0024

P (yes) * P (having a room = No | yes) * P (marital status = married | yes) * P (annual income = 120k | yes) = 0.3*1*0*1.2*10-9 = 0

Because 0.0024 is greater than 0, this record is classified as No.

As shown in the preceding example, if the class condition probability of an attribute is equal to 0, the posterior probability of the entire class is equal to 0. The method of estimating the probability of a class condition by using the record ratio is too fragile, especially when the number of training samples is small and the number of attributes is large. To solve this problem, use the M estimation method to estimate the conditional probability:


(Sina Weibo: @ quanliang _ machine learning)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.