AdaBoost Study Summary

Last Update:2016-02-24 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Basic information is on-line.

First, origin

The boost algorithm family originates from PAC Learnability (Pac-Learning). This set of theories focuses on when a problem can be learned and, of course, explores specific learning algorithms that can be used for learning problems. The theory was presented by Valiant, and therefore (and other contributions) he won the 2010 Turing Award.

PAC defines the strength of the learning algorithm

Weak Learning Algorithm---Recognition error rate is less than 1/2 (that is, the accuracy rate is only slightly higher than the random guessing learning algorithm)

Strong Learning Algorithm---Recognition of high accuracy and can be completed in polynomial time learning algorithm

At the same time, Valiant and Kearns first put forward the equivalence of weak learning algorithm and strong learning algorithm in PAC learning Model, that is, any given weak learning algorithm which is slightly better than random guessing, can it be promoted to strong learning algorithm? If the two are equivalent, then it is necessary to find a weak learning algorithm that is slightly better than random guessing, which can be promoted to a strong learning algorithm instead of finding strong learning algorithms that are difficult to obtain. That is the speculation, so that countless cattle to design algorithms to verify the correctness of PAC theory.

But for a long time there is no practical way to achieve this ideal. Details determine success or failure, and a good theory requires an effective algorithm to execute. Finally Kung Fu, Schapire in 1996 to propose an effective algorithm to truly realize this long-cherished wish, its name is AdaBoost. AdaBoost combines a number of different decision trees in a non-random way, showing amazing performance! First, the accuracy rate of decision tree is greatly improved, and can be comparable with SVM. Second, the speed is fast, and the basic parameters are not adjusted. Third, almost no overfitting. I guess Breiman and Friedman must have been happy, because seeing their cart is being compared to SVM, adaboost to bring the decision tree back to the dead! Breiman can not help but in his thesis praised AdaBoost is the best spot method (off-the-shelf, that is, "take it Off" means).

Second, AdaBoost algorithm

AdaBoost, an abbreviation for the English "Adaptive boosting" (adaptive enhancement), was presented by Yoav Freund and Robert Schapire in 1995. Adaboost is an iterative algorithm whose core idea is to train different classifiers (weak classifiers) for the same training set, and then assemble this Adaboost of weak classifiers to form a stronger final classifier (strong classifier). The algorithm itself is achieved by changing the distribution of data, which determines the weights of each sample based on the correctness of the classification of each sample in each training set and the accuracy of the last population classification. The new data set that modifies the weights is sent to the lower classifier for training, and finally the classifier that is trained at the end of each training is fused as the final decision classifier. Using the AdaBoost classifier, you can eliminate some unnecessary training data features and place the key on key training data.

AdaBoost is an iterative algorithm whose core idea is to train different classifiers for the same training set, namely weak classifiers, and then assemble these weak classifiers to construct a stronger final classifier. (Many blogs say Three Stooges equals)

The algorithm itself is a change in the distribution of data, based on the correct classification of each sample in each training set, and the accuracy of the last population classification, to determine the weights of each sample. The new data that modifies weights is given to the lower classifier for training, and then the classifier that is trained each time is combined as the final decision classifier.

The entire AdaBoost iterative algorithm is 3 steps:

Initializes the weight distribution of the training data. If there are N samples, each training sample is given the same weight at the very beginning: 1/n.
Training weak classifiers. In the training process, if a sample point has been accurately classified, then in the construction of the next training set, its weight is reduced, conversely, if a sample point is not accurately classified, then its weight is improved. Then, the weight-updated sample set is used to train the next classifier, and the entire training process goes on so iteratively.
The weak classifiers of each training are combined into strong classifiers. After the training process of each weak classifier is finished, the weight of the weak classifier with small classification error rate is enlarged, which plays a larger role in the final classification function, while the weight of the weak classifier with large classification error rate is reduced, which plays a smaller role in the final classification function. In other words, a weak classifier with a low error rate occupies a larger weight in the final classifier, otherwise it is smaller.

Iii. specific examples to explain the adaboost operation process

Perhaps you read the above introduction may still be the adaboost algorithm foggy, does not matter, Baidu Daniel to give a very simple example, you see will be on this algorithm overall is very clear.

Let's take a simple example to see the implementation of AdaBoost:

In the diagram, "+" and "-" represent two categories, in which we use horizontal or vertical lines as classifiers to classify them.

The final result if the 1 description is "+", if 1 then "-".

The first step:

According to the correct rate of classification, a new sample distribution is obtained D2, a sub-classifier H1

One of the circle samples indicates that the sample was divided incorrectly. In the diagram on the right, the sample is weighted by a larger "+" representation.

"Calculation of ε and α"

1) at the beginning of the uniform distribution, 10 points: m=10, Weights D1 (i) =1/m=1/10=0.1;

2) The error ε equals the sum of the weights of the points that are incorrectly classified: 0.1+0.1+0.1=0.3

3) α According to the formula calculated 0.42

4) Update weights, error point weights increased: exp (-αt * Yi * HT (xi)), when Yi≠ht (xi), when the description of the true Y and assumed H inconsistent, assuming that the error, the wrong classification, that is, Y=1 and h=-1, etc., by exp (αt) increase the power value; Reduce the weight by multiplying exp (-αt). So

Three "+" classified by mistake: 0.1* (e^0.42) =0.152

Correctly categorized "+" and "-": 0.1*[e^ (-0.42)]=0.066

Due to the distribution, weights and 1, the error point 0.152/0.152*3+0.066*7≈0.17; correct point 0.066/0.152*3+0.066*7=0.07.

Check: 0.17*3+0.07*7=1

Step Two:

According to the correct rate of classification, a new sample distribution is obtained D3, a sub-classifier H2

"Calculation of ε and α"

At this time weight: 2 0.07 "+", 2 0.07 "-", 3 0.17 "+", 3 divided by 0.07 "-".

The three points that were incorrectly classified are the correct points and the weights are 0.07, so the new error ε2=0.07*3=0.21

α is calculated to be 0.65.

----------------update weights and Return to one: (e^0.65=1.91554; e^-0.65=0.522)-------------

Wrong points: 3 0.07*1.91554=0.134

Correct point: 4 0.07*0.522=0.03654; 3 0.17*0.522=0.08874

return: 3*0.134+4*0.03654+3*0.08874=0.81438

0.134/0.81438=0.1645; 3 A-

0.03654/0.81438=0.044868; 4 of 2-

0.08874/0.81438=0.108966; 3 x +

Inspection: 0.1645*3+0.044868*4+0.108966*3=0.99987≈1

Step Three:

Get a sub-classifier H3.

"Calculation of ε and α"

ε=0.044868*3=0.1346≈0.14

Alpha calculates 0.92.

Consolidate all sub-classifiers:

So you can get the results of integration, from the results, in a timely and simple classifier, combined together can also get a good classification effect, in the example of all.

Here, perhaps you have a general understanding of the AdaBoost algorithm. But maybe you have a question, why do each iteration have to make the weighted value of the wrong point bigger? What good is this? No, it's not okay?

Then look at the adaboost algorithm. Note that the final table-to-form of the algorithm, which is a representation of the weight of the value, is obtained by. And A is the expression of the error, here you can get a clearer answer, all points to the error. Raise the weight of the error point, the next time the classifier again points out the wrong points, will improve the overall error rate (because the weight of the wrong point is aggravated, then the wrong words will be added to the error inside), which leads to a change of very small, resulting in the classifier in the entire mixed classifier weight lower. In other words, this algorithm makes good classifiers take up the overall weight value, while the lower classifier weights. Well, that's a pretty good line of logic. In this regard, I think there is a thorough understanding of the adaboost.

Iv. about the value of α and the analysis of ε see "boosting Methods for automaticsegmentation of Focal liver Lesions" chapter II, very subtle.

Five, not read the article: http://blog.csdn.net/tiandijun/article/details/48036025

AdaBoost Study Summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More