The naïve Bayesian algorithm for machine learning (1) __ Machine learning

Source: Internet
Author: User

This is already the third algorithm of machine learning. Speaking of the simple Bayes, perhaps everyone is not very clear what. But if you have studied probability theory and mathematical statistics, you may have some idea of Bayesian theorem, but you can't remember where it is. Yes, so important a theorem, in probability theory and mathematical statistics, only a very small space to introduce it. This is not a strange book, because its expression is too simple.

Let's take a look at its mathematical representation:

P (AB) =p (b| A) P (a) =p (a| b) P (b)

Yes, that's the Bayes theorem. It's actually just a formula for probability calculation. So how to apply it. We generally know that there are more blacks in Africa, which means that Africans are more likely to have black people. So if we meet a black man, then you'll guess where he came from. Africa, that's right. This logic is based on the Bayes theorem.

So the simple Bayes, what is called simplicity. Because it regards the characteristics of things as being independent of each other, for example, we are still looking at people, people may have skin color, height, physique and ... Hey, I'm evil. And so on, are these features independent of each other? Of course not, such as the black average height is not white high, there are black people running ability and so on, characteristics and characteristics are related. But naive Bayesian sees them as independent.

In principle, naive Bayes has an objective minimum error rate because it requires the least number of parameters. But it is because of this erection, only to lead to the naïve Bayes error rate is not satisfactory, this requires further optimization, it is something. We are now talking about how naive Bayes can be applied to machine learning.

We've all seen the formula above. Here's another form of expression:

P (ci| W) =p (w| CI) P (CI)/P (W)

Here we think of W as a eigenvector that contains a lot of features WI, then CI represents the category of categories. So we know that P (W) is constant for each category and therefore negligible. Then it is only necessary to calculate the size of the molecule to compare the size of the equals sign to the left, then P (w| Ci) can be used in P (wi| Ci) is multiplied by all, and this is due to the simplicity of Bayesian to make all the features independent of each other in order to have this formula, otherwise, there is no way to do this. So what is P (Ci), that is, all samples of the classification of this category, such as the sample has 2 categories, the first category of 6, the second category of 4, then P (C1) =0.6,p (C2) = 0.4, this one is very good calculation. Then you just need to calculate P (wi| Ci) can be. Just look at how much probability this feature will have in the first category. So what is the simple Bayesian training for, in order to find the P (wi| Ci) Once these probabilities are obtained according to the training set, then when these characteristics arise, the category of the sample can be judged based on the number and probability of the features.


Of course, the more training samples, the more accurate the probability distribution of the features. This is the mystery of naive Bayes.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.