Why is naive Bayes a high deviation, low variance?

Source: Internet
Author: User

First, let's say you know the relationship between the training set and the test set. In short, we have to learn a model in the training set, and then get the test set to use, the effect is not good to be measured according to the test set error rate. But most of the time, we can only assume that the test set and the training set are in accordance with the same data distribution, but do not get the real test data. How do you measure the test error rate when you see only the training error rate?

Because the training sample is very small (at least not enough), the model obtained through the training set is not always true. (even if the correct rate is 100% on the training set, it does not mean that it depicts the real data distribution, but that it is our goal to portray the real data distribution, not just the limited data points of the training set). Moreover, in practice, training samples often have a certain noise error, so if the pursuit of perfection in the training set and adopt a very complex model, will make the model of the training set inside the error as the real data distribution characteristics, so that the wrong data distribution estimates. In this case, the real test set on the wrong mess (this phenomenon called fitting). But also can not use too simple model, otherwise when the data distribution is more complex, the model is not enough to depict the data distribution (reflected in the training set the error rate is very high, this phenomenon is less than fit). Over-fitting indicates that the model used is more complex than the real data distribution, and that the model used in the less-fitting representation is simpler than the real data distribution.

In the framework of statistical learning, when you describe the complexity of the model, there is a view that error = Bias + Variance. The error here can probably be understood as the prediction error rate of the model, which is made up of two parts, partly because the model is too simple to estimate the inaccurate parts (Bias), and the other part is because the model is too complex to bring about the greater change of space and uncertainty (Variance).

So it's easy to analyze naive Bayes. It simply assumes that the individual data is irrelevant and is a severely simplified model. Therefore, for such a simple model, most of the occasions will be bias part greater than the variance part, that is, high deviation and low variance.

In practice, in order to make the error as small as possible, we need to balance the proportions of bias and variance when choosing the model, that is, balancing over-fitting and under-fitting.

Why is naive Bayes a high deviation, low variance?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.