Naive Bayes classifier (I)

Source: Internet
Author: User

I have read the naive Bayes classifier over the past two days. Here I will take a simple note based on my own understanding and sort out my ideas.

 

I. Introduction

1. What is a naive Bayes classifier?
Naive Bayes ClassifierIt is a simple probabilistic classifier that applies Bayesian theorem based on independent assumptions. Independent assumptions mean that each feature of the sample is unrelated to other features. For example, an object has color, size, weight, material, and other features that are unrelated to each other, that is, no matter what color does not affect the size, no matter what size does not affect its color.

2. What is Bayesian Theorem?

Bayes's Theorem)It is a conclusion in probability theory.Random VariableAndConditional Probability)And edge probability distribution.Bayes TheoremIt's about random events A and B.Conditional Probability(OrEdge Probability) Is actually the resultConditional Probability.

The so-calledConditional ProbabilityThat is, the probability of event a occurring when Event B occurs, expressed by P (A | B.

The following describes the derivation process of Bayesian theorem:

As shown in the following graph, we can see that the probability P (A ∩ B) of event a and Event B at the same time is equal to the probability P (B) of Event B) multiply by the probability P (A | B) of event a when Event B occurs ),

It is also equal to the probability P (A) of event a, multiplied by the probability P (B | A) of Event B When event a occurs, which is expressed by the formula:

 

P (A ∩ B) = P (a) * P (B | A) or P (A ∩ B) = P (B) * P (A | B)

 

The formula above deformation can be found:

 

P (A | B) = P (a) * P (B | A)/P (B)

 

This is the conditional probability formula, where:

P (A | B) occurs when Event B occurs.Conditional ProbabilityBecause this value is affected by B, it is called"Posterior Probability".

P (A) is called event.Anterior Probability(OrEdge ProbabilityIt does not take into account event B. When using Bayesian theorem, this value is known (a value estimated based on the situation ).

P (B | A)/P (B) can be understood as a probability factor, which is a supplement to the prior probability, so that the result is closer to the true probability..

Therefore, the Bayesian theorem can be expressed:

Posterior Probability = prior probability * likelihood factor

Generally, when Bayesian theorem is used, a prior probability is first estimated, and the following probability factor is calculated based on the statistical data. If the probability factor is <1, the probability of event occurrence is reduced (lower than the prior probability ), if the probability factor is greater than 1, it indicates that the probability of event occurrence increases (higher than the prior probability). If the probability factor is equal to 1, it indicates that event B has no effect on the probability of event.

 

Note: When Bayesian theorem is used, the richer the statistical data, the closer the calculation result is to the actual probability! That is to say, in the process of application, even if the calculation result is not accurate at the beginning, but with the increase of statistical data, the possibility factor will be constantly adjusted (calibration, this process is also called the training process ), the calculation results are constantly corrected to the true probability.

 

II. Application:

According to the following statistics, a man has a sore throat. What is the probability of pharyngitis?

Gender Symptom Cause
Male Sore throat Catch a cold
Female Fever Catch a cold
Male Sore throat Pharyngitis
Female Fever Pharyngitis
Female Fever Catch a cold
Female Sore throat Pharyngitis

 

According to Bayesian theorem, we can regard "gender", "Symptom", and "cause" as events, the problem is that the gender is the probability of pharyngitis when the male and the symptoms are sore throat, that is, P (pharyngitis | male × sore throat ).

According to the Bayes Theorem, we can get the following:

P (pharyngitis | male × sore throat) = P (pharyngitis) × P (male × sore throat | pharyngitis)/P (male * sore throat)

Assume that "gender" and "Symptom" are independent of each other, then:

P (pharyngitis | male × sore throat) = P (pharyngitis) × (P (male | pharyngitis) × P (sore throat | pharyngitis)/(P (male) × P (sore throat ))

Where p (pharyngitis) isAnterior Condition, (P (male | pharyngitis) × P (sore throat | pharyngitis)/(P (male) × P (sore throat) is the likelihood factor, assume that the probability of pharyngitis is 50%. Based on the above data, the calculation result is:

P (pharyngitis | male x sore throat) = 0.50 × (0.33 × 0.67)/(0.33 × 0.50) = 0.67

The calculation result shows that the probability of the man suffering from pharyngitis is 67%.

If we add another record:

Gender Symptom Cause
Male Sore throat Pharyngitis

The calculation result is:

P (pharyngitis | male x sore throat) = 0.50 × (0.50 × 0.75)/(0.43 × 0.57) = 0.77

It can be seen that as the statistical data increases, the calculation result is closer to the actual probability.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.