Boosting and bagging understanding

Source: Internet
Author: User

As the two methods of integrated learning, the realization of bagging and boosting is quite easy to understand, but the theory proves to be more laborious. The first two methods are described below.

The so-called integrated learning is to combine multiple or multiple weak classifiers into a strong classifier, so as to improve the effect of classification method. Strictly speaking, integration learning is not a classifier, but a method of combining classifiers.

1.bagging

Bagging is a very basic method of integrated learning, his proposed is to enhance the classifier effect, but in dealing with imbalance problem has a good effect.


For example, the original data set through the T random sampling, to get t and the original data set of the same size of the sub-datasets, respectively, training to get t weak classifier classifier, and then combined into a strong classifier.

The probability interpretation and effect analysis of random sampling are given below:

Using the booststrap idea inside probability theory, due to the inaccuracy of small sample estimation, coupled with the improvement of modern computing performance, the precision of small sample can be improved by repeated calculation.

The original small sample does not correctly reflect the real distribution of the data, and the real distribution is fitted by T random sampling.

The following formula for the L-sub-classification of the strong classifier is equal to the expectation of the L-times estimate:


The following formula is the difference between the true Y and each weak classifier, and expands to get to the right:


The following formula indicates that the difference between the weak classifier and the statistical average will be greater than the difference of the strong classifier, in short, it is better fit by the strong classification.


The result of the above is that if the original data for the real distribution of the premise, with the bagging integrated classifier, is always able to improve the effect, the effect of ascension depends on the stability of the classifier, the worse the stability, the higher the effect of ascension. An unstable classifier such as a neural network.

Of course, the above hypothesis is that the data is close to the real distribution and then in probability [1/n,1/n,..... 1/N] under resampling.

If the training data is not a true distribution, then the bagging effect may be worse than non-bagging.

The next step is how to set the L weak classifier into a strong classifier:

The simplest method is the voting method (vote). For a test sample, L is given a class of information through the L weak classifier, which polls for the final category. such as l=10, the results of the classification are: [3,3,3,3,5,5,6,7,1,8.]

Then this sample belongs to 3.


2.boosting

Similar to bagging integration learning, boosting also obtains multiple weak classifiers by resampling, and finally obtains a strong classifier. The difference is that boosting is an integration of weak classifiers based on weights.


The above is a boosting flowchart, briefly summarized as follows:

1.E represents the error classification rate of a weak classifier, which is calculated as the confidence weight A for the classifier, and the update sampling weights d.

The 2.D represents the weight matrix of the original data and is used for random sampling. At first, the sampling probabilities for each sample are the same, 1/m. In the classification of a weak classifier, the classification error or the right, then D will be correspondingly increased or decreased according to E, then the divided sample due to D increases, the probability of the next sample sampling is increased, thereby increasing the probability of the next sub-pair of the last split sample.

3.α is the credibility of the weak classifier, the bagging of α is 1,boosting, according to the performance of each weak classifier (e lower), determine the result of the classifier in the total result of the weight, the classification quasi-natural accounted for more weight.

Finally, based on the reliability α, and the estimated H (x) of each weak classifier, the final result is obtained.


As the flow chart for boosting, mainly two parts, update the sampling weights D and calculate the classifier weight α, the former makes the original sub-error of the sample and then the next classifier can have a greater chance to appear, thereby improving the original divided the wrong sample after the probability of the pair, the latter according to the performance of the classifier, Assigning different weights to different weak classifiers and finally obtaining a weighted strong classifier.

The effect of boosting probability is proved to be omitted here, only to draw a conclusion, constantly iterative update can make the final result infinitely close to the optimal classification, but boosting will tend to always divide the wrong samples, if there are outliers in the sample, boosting will appear bad results.

The summary above discusses two integrated learning methods, bagging and boosting,boosting a bit like the improved version of bagging, adding the concept of weighted sampling and weighted strong classification. Are all implemented through resampling and weak classifier fusion.


Boosting and bagging understanding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.