Statistical learning Method--AdaBoost algorithm for lifting method (integrated learning)

Last Update:2017-03-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Main content

This paper introduces the integration learning, then narrates the differences and relations between boosting and bagging, deduces the derivation of AdaBoost and GBDT, and finally compares the differences and relations between random forest and GDBT.

2. Integrated Learning

Integrated Learning (Ensamble learning) accomplishes tasks by building multiple learners. The general structure of integrated learning: First, a set of "individual learners", and then a certain strategy to connect these individual learners, the individual learning device is produced by an existing algorithm, such as c4.5 decision Tree, BP Neural network. According to whether the individual learner is the same in integrated learning, it can be divided into homogeneous integration and quality integration, and the individual learners in the homogeneous integration are called "base learners", and the individual learners in the quality integration are called "component Learners".

Integrated learning uses a combination of multiple learners to get powerful functionality, usually the individual learner is a weak classifier: The classification result is not too good but is better than the predicted result of random guessing (accuracy is higher than 50%). There are two types of integration learning methods:

1), boosting: There is a strong dependency between individual learners, only serial generation of different individual learners, with AdaBoost and GDBT as the representative

2), bagging: Individual learner independent generation, no relationship between each other, with random forest as the representative

3, boosting and bagging

Boosting work mechanism: from the initial training data set to train a base learner, and then based on the performance of the base learner to adjust the training sample, so that the previous base learner error in the sample weight of the learning results gained greater attention, and then use the adjusted training samples to train the next base learner , so repeat until a sufficient number of base learners are produced.

Bagging work mechanism: All training samples are also put back to the sample (bootstrap sampling) to produce and train data of the same scale of the new training data, repetitive actions to produce n training samples, and then use the N sample data for the training of the base learner, there is no relationship between the base learner, independent of each other.

4. AdaBoost algorithm

Input: Training data set t={(X1,y1), (x2,y2),..., (Xn,yn)}; where xy = {-1, +1}; weak classifier algorithm

Output: Final classifier g (x), where a classifier is obtained

(1), the distribution weights of the initialization training data

D1 = {w11,w12,....,w1i,.....,w1n}, i = ..., N

w1i = 1/n

(2), for M-base learners to be generated, for m =,....., m:

(a) Using training data with a weighted distribution DM to generate a base learner:

(b), calculate the classification error rate of GM (x) on the training data set:

Regarding this error rate calculation problem, there are different calculation methods for different problems, here is the calculation method of the classifier, the above formula can also be written as:

This expression indicates the sum of the weights of the sample by GM (x) when the classification error rate is calculated, and the relationship between the classification error rate and the sample weight can be seen from this relationship. here, the actual project needs to determine whether the following error rate is less than 0.5, if less than that the base learner is not as good as the results of random predictions, if added to the final classifier will seriously affect the performance of the classifier needs to abandon the learner.

(c), the coefficient of the GM (x) of the computational basis learner

The coefficient of the base learner GM (x) is that the learner correctly classifies the probability by dividing the probability of the error by half of the value. The greater the EM-hour αm, the smaller the classification error rate, the greater the role of the classifier in the final classifier, and the reason why the final classifier has a strong performance. at the same time, if EM 0.5, the result is less than 0. This is not allowed, this is the above to discard the classification error greater than 0.5 of the reason for the learner.

(d) Updating the weight distribution of the training data set

Where ZM is:

　　From the formula to update the weight value, it can be seen that the weight of the correctly categorized sample data is reduced (E-am < 1), the same error classification of the weight of the sample is larger, so the error sample in the next round of training will be taken seriously, if the sample is still wrong then the error rate will be very large, Because it will inevitably lead to the change of the base learner classification strategy to make the error rate smaller, by updating the weights to avoid changing the training data, so that different training data in the learning process of the base learner plays a different role, which is a feature of AdaBoost.

(3), construct a linear combination of basic classifiers

Get the final classifier:

　　It is important to note that the sum of the αm here is not equal to 1, indicating the importance of each classifier, the f (x) symbol indicates the category of x, |f (x) | Indicates the reliability of classification, the greater the reliability of the classification results.

For adaboost regression analysis, the following is from http://www.cnblogs.com/pinard/p/6133937.html

　　the difference between the regression problem and the classification problem is that the error rate calculation is different, the classification problem generally adopts 0/1 loss function, and the regression problem is usually the square loss function or the linear loss function , so there will be a big difference here.

For the K-weak learner, calculate his maximum error on the training set:

The relative error results for each sample are then calculated:

Here is the case of linear error, if it is squared error, then

If the exponential error function is used then:

Finally, the error rate of the K-weak learner is obtained:

After the error rate, the weight coefficient of the learning device is calculated:

After calculating the weight coefficient of the learner, the core sample weight dm+1:

Finally, a linear combination:

At the same time, other adaboost algorithms can be consulted: http://www.cnblogs.com/jcchen1987/p/4581651.html

Here is a summary of the pros and cons of the adaboost algorithm.

The main advantages of AdaBoost are:

1) High classification accuracy when adaboost as a classifier

2) under the framework of adaboost, it is very flexible to use various regression classification models to construct weak learners.

3) as a simple two-element classifier, the structure is simple and the results are understandable.

4) not easy to fit

The main drawbacks of AdaBoost are:

1) sensitive to abnormal samples, abnormal samples may get higher weights in iteration, which will affect the prediction accuracy of the final strong learners.

5. Random Forest

Typical representative of bagging. Not finished

Statistical learning Method--AdaBoost algorithm for lifting method (integrated learning)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Statistical learning Method--AdaBoost algorithm for lifting method (integrated learning)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Statistical learning Method--AdaBoost algorithm for lifting method (integrated learning)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support