Nine algorithms for machine learning---naive Bayesian classifier

To understand the Naive Bayes classification

Bayesian classification is a generic term for a class of classification algorithms, which are based on Bayesian theorem, so collectively referred to as Bayesian classification. Naive naive Bayesian classification is the simplest and most common classification method in Bayesian classification. In this article, I use straightforward words to summarize the naïve Bayesian classification algorithm we talked about in our study, hoping to help others understand it.

a summary of 1 classification problems

for the classification problem, in fact, no one is unfamiliar, daily life we carry out the classification process. For example, when you see a person, your brain subconsciously determines whether he is a student or a social person; You may often walk on the road and say to your friends that "this person is very rich at a glance", in fact, this is a sort of operation.

since it is the Bayesian classification algorithm, then what is the mathematical description of the classification?

From a mathematical point of view, the classification problem can be defined as follows: Known set and, determine mapping rule y = f (x), so that any and only one, so that the establishment.

Where c is called a collection of categories, where each element is a category, and I is called an item set (a feature set), where each element is a category to be classified, and F is called a classifier. the task of the classification algorithm is to construct the classifier F.

The content of the classification algorithm is to ask for a given feature, let's draw a category, which is also the key to all classification problems. So how to specify the characteristics, get our final category, also we want to talk about, each of the different classification algorithm, corresponding to the different core ideas.

In this article, I will use a concrete example of the naïve Bayesian algorithm almost all the important points of knowledge to explain.

2 Naive Bayes classification

So since it is naive Bayesian classification algorithm, what is its core algorithm?

Is the following Bayesian formula:

A different form of expression would be a lot clearer, as follows:

We finally ask for P (category | features) can! is the equivalent of completing our mission.

3 Example Analysis

Let me give you an example of the problem.

The given data is as follows:

Now give us the question is, if a pair of male and female friends, boys want to propose girls, boys four characteristics are not handsome, character is not good, tall, not progressive, please judge whether the girl is married or not to marry?

This is a typical classification problem, and the math problem is comparing P (marry | Not handsome, bad character, body height, not progressive) and P (not married | Not handsome, bad character, body height, not progressive) probability, who's probability, I can give married or not marry the answer!

Here we contact the Naive Bayes formula:

We need to ask P (marry | Not handsome, bad character, body height, not progressive), which we do not know, but through the naïve Bayesian formula can be converted into a good three quantities.

P (not handsome, bad character, body height, not motivated | To marry), p (not handsome, bad character, body height, not progressive), p (married) ( as to why can ask, the back will say, then it is too good, will be asked for the amount of conversion to other available values, which is equivalent to solve our problem!) )

4 naïve Bayes algorithm of the simple word explanation

So how are these three quantities calculated?

is based on the known training data, the following gives a detailed example of the solution process.

Recall our request for the following formula:

Then I just want to get P (not handsome, bad character, body height, not aggressive | marry), p (not handsome, bad character, body height, not upward), p (marry) can, good, the following I have to find out these probabilities, the last one, the final result.

P (not handsome, bad character, body height, not progressive | marry) = p (not handsome | marry) *p (not good character | marry) *p (Tall | marry) *p (not aggressive | marry), then I will be counted the following several probabilities, also got the probability of the left!

Wait, why is this set up? Students who have studied probability theory may have feelings, and the conditions of this equation need to be independent of each other.

That's right! This is why the naïve Bayesian classification has the origin of the simple word, the naïve Bayesian algorithm is to assume that each feature is independent of each other, then this equation is set up!

But why do we need to assume that features are independent of each other?

1, we think so, if there is no such hypothesis, then we have to the right of these probabilities are not to be done, so we have 4 characteristics of this example, including the handsome, not handsome, including {handsome, good, good}, height includes {high, short, medium}, progressive including {not progressive, progressive}, The combined probability distribution of four features is a total of 4-dimensional space with a total number of 2*3*3*2=36.

36, computer scan statistics can also, but in real life, often have a lot of characteristics, each characteristic of the value is also very much, then through the statistics to estimate the value of the later probability, become almost impossible, this is why the need to assume the characteristics of the independent reason.

2, if we do not assume that the characteristics of each other independent, then we statistics, we need to find in the entire feature space, such as statistics p (not handsome, bad character, body height, not progressive | married),

We need to marry under the conditions, to find four kinds of characteristics are not handsome, bad character, height, low, not progressive number of people, so that, because of the sparse data, it is easy to count to 0 of the situation. This is not appropriate.

According to the above two reasons, naive Bayes method has made the hypothesis of conditional independence for conditional probability distribution, because this is a strong hypothesis, naive Bayesian also named! This hypothesis makes naive Bayesian method simple, but sometimes sacrifices a certain classification accuracy rate.

OK, so I explained why we can split the components into the form of multiplication. Then we'll start solving it!

We will arrange the above formula as follows:

Below I will be one of the statistical calculation ( when the amount of data is very large, according to the central limit theorem, the frequency is equal to the probability, here is just an example, so I will do statistics).

P (married) =?

First we organize the training data, the number of married samples are as follows:

Then P (Marry) = 6/12 (total samples) = 1/2

P (not handsome | married) =? The statistics meet the sample number as follows:

P (not handsome | married) = 3/6 = 1/2 under the condition of marrying, how much does it look handsome?

P (Bad character | marry) =? The statistics meet the sample number as follows:

P (Bad character | marry) = 1/6

P (Short | married) =? Statistics meet the sample number as follows:

Then P (Short | married) = 1/6

P (not progressive | marry) =? Statistics meet the sample number as follows:

P (not progressive | married) = 1/6

The following begins to seek the denominator, p (not handsome), p (bad character), p (short), p (not progressive)

The statistical samples are as follows:

Not handsome statistics as shown in red, accounting for 4, then P (not handsome) = 4/12 = 1/3

Bad character statistics as shown in red, accounting for 4, then P (bad character) = 4/12 = 1/3

Height statistics as shown in red, accounting for 7, then p (height is short) = 7/12

Non-upward statistics as shown in red, accounting for 4, then P (not progressive) = 4/12 = 1/3

Here, ask P (not handsome, bad character, body height, not aggressive | marry) all the necessary items to find out, the following I brought into the can,

= (1/2*1/6*1/6*1/6*1/2)/(1/3*1/3*7/12*1/3)

Below we according to the same method to ask P (do not marry | Not handsome, character bad, tall, not progressive), the same approach, in order to facilitate understanding, I also go here to help understand. The first formula is as follows:

Below I also one to carry on the statistical computation, here and the above formula, the denominator is the same, therefore our denominator does not need to recalculate the computation!

P (not married) =? The statistics are calculated as follows (red to meet the conditions):

Then P (not married) =6/12 = 1/2

P (not handsome | not married) =? The statistics meet the criteria as follows (red to meet the conditions):

Then P (not handsome | not married) = 1/6

P (Bad character | Not married) =? Statistics are calculated as follows (red to meet the conditions):

P (Bad character | Not married) =3/6 = 1/2

P (Short | Not married) =? Statistics are calculated as follows (red to meet the conditions):

Then P (Short | Not married) = 6/6 = 1

P (not progressive | not married) =? Statistics are calculated as follows (red to meet the conditions):

P (not progressive | not married) = 3/6 = 1/2

Then according to the formula:

P (not married | Not handsome, bad character, body height, not progressive) = ((1/6*1/2*1*1/2) *1/2)/(1/3*1/3*7/12*1/3)

Apparently (1/6*1/2*1*1/2) > (1/2*1/6*1/6*1/6*1/2)

So there is P (not marry | Not handsome, bad character, body height, not progressive) >p (Marry | Not handsome, bad character, body height, not progressive)

So we based on naive Bayesian algorithm can give this girl answer, is not marry!!!!

5 Advantages and disadvantages of naive Bayesian classification

Advantages:

(1) The algorithm logic is simple, easy to implement ( the algorithm is simple, as long as the use of Bayesian Formula Translational Medicine can!) )

(2) Small space-time overhead during classification ( assuming that features are independent, only two-dimensional storage is involved)

Disadvantages:

Theoretically, the naive Bayesian model has the smallest error rate compared with other classification methods. But this is not always the case, this is because the naïve Bayesian model assumes that the attributes are independent of each other, this hypothesis is often not established in the practical application, when the number of attributes is more or the correlation between the attributes is large, the classification effect is not good.

However, the performance of naive Bayesian is the best when the property correlation is small. For this, there are semi-naïve Bayesian algorithms that are moderately improved by considering some of the correlations.

The whole example of the naïve Bayesian algorithm to explain the classification process, I hope that everyone's understanding is helpful ~

Reference: Dr. Hangyuan Li, "Statistical learning method"

Algorithm grocer--naive Bayesian classification of classification algorithm (Naive Bayesian classification)

Acknowledgements: Tokugawa, Hao Yu, Hao, Shi

Original address: https://mp.weixin.qq.com/s?__biz=MzI4MDYzNzg4Mw==&mid=2247483819&idx=1&sn= 7f1859c0a00248a4c658fa65f846f341&chksm= Ebb4397fdcc3b06933816770b928355eb9119c4c80a1148b92a42dc3c08de5098fd6f278e61e#rd

Nine algorithms for machine learning---naive Bayesian classifier