Summary of AdaBoost algorithm principle of integrated learning

Source: Internet
Author: User

In the summary of integrated learning principle, we discuss whether there are two kinds of dependencies between the individual learners, the first one is the strong dependence between individual learners, and the other is that there is no strong dependency between individual learners. The former representative algorithm is the boosting series algorithm. In the boosting series algorithm, AdaBoost is one of the most famous algorithms. AdaBoost can be used either as a classification or as a regression. In this paper, we make a summary of the adaboost algorithm.

1. Review the fundamentals of the boosting algorithm

In the summary of integrated learning principles, we have talked about the basic ideas of the boosting algorithm series, such as:

It can be seen that the working mechanism of the boosting algorithm is to train a weak learner 1 from the training set with initial weights, and update the weights of the training samples according to the learning error rate of weak learning, which makes the weight of the training sample points with high learning error rate of the previous weak learner 1 higher, This makes the points with high error rate get more attention in the weak learner 2 behind. Then the weak learner is trained based on the training set after adjusting the weights 2., so repeat until the number of weak learners reached the number of pre-specified T, eventually the T-weak learners through a set of strategies to integrate, to get the final strong learner.

However, there are several specific questions that the boosting algorithm does not elaborate.

1) How to calculate the learning error rate E?

2) How to get the weak learner weight coefficient $\alpha$?

3) How do I update the sample weight d?

4) What combination strategy is used?

As long as the boosting big family algorithm, we have to solve these 4 problems. So how did the AdaBoost solve it?

2. Basic idea of AdaBoost algorithm

We are here to explain how AdaBoost solved the previous section of these 4 questions.

Suppose our training set sample is $ $T =\{(x_,y_1), (x_2,y_2), ... (x_m,y_m) \}$$

The output weights of the training set for the K-weak learners are $ $D (k) = (W_{k1}, W_{k2}, ... w_{km}); \;\; w_{1i}=\frac{1}{m};\;\; I =1,2...m$$

First, let's look at the classification of AdaBoost.

The error rate of the classification problem is well understood and calculated. Since multivariate classification is a generalization of the two-yuan classification, it is assumed that we are a two-tuple problem with the output of { -1,1}, then the K-Weak classifier $g_k (x) $ on the training set has a weighted error rate of $ $e _k = P (G_k (x_i) \neq y_i) = \sum\limits_{i=1}^ {M}w_{ki}i (G_k (x_i) \neq y_i) $$

Then we look at the weight coefficient of the weak learner, for the two-tuple classification problem, the K-weak classifier $g_k (x) $ has a weight factor of $$\alpha_k = \frac{1}{2}log\frac{1-e_k}{e_k}$$

Why do you calculate the weight factor of the weak learner? As can be seen from the above, if the classification error rate $e_k$ larger, then the corresponding weak classifier weight coefficient $\alpha_k$ smaller. In other words, the lower the error rate of the weak classifier weight coefficient is greater. Specifically why this weight coefficient formula, we talk about the AdaBoost loss function optimization.

The third issue, update update sample weight d. Assuming that the sample set weights for the K weakly classifier are $d (k) = (W_{k1}, W_{k2}, ... w_{km}) $, the sample set weight factor for the corresponding k+1 weak classifier is $ $w _{k+1,i} = \frac{w_{ki}}{z_k}exp (-\ Alpha_ky_ig_k (x_i)) $$

Here $z_k$ is the normalization factor $ $Z _k = \sum\limits_{i=1}^{m}w_{ki}exp (-\alpha_ky_ig_k (x_i)) $$

As can be seen from the $w_{k+1,i}$ calculation formula, if the first sample classification is wrong, then $y_ig_k (x_i) < 0$, resulting in the weight of the sample increased in the k+1 weak classifier, if the classification is correct, then the weight in the first k + Reduced in 1 weak classifiers. Specifically, why we use the sample weights to update the formula, we talk about the AdaBoost loss function optimization.

The last issue is the collection strategy. The AdaBoost classification uses the weighted average method, and the final strong classifier is $ $f (x) = sign (\sum\limits_{k=1}^{k}\alpha_kg_k (x)) $$

    

Then we look at AdaBoost's return problem. Since there are many variants of adaboost regression problem, we take the AdaBoost R2 algorithm as the standard.

Let's first look at the error rate of the regression problem, for the K-weak learner, calculate his maximum error on the training set of $ $E _k= max|y_i-g_k (x_i) |\;i=1,2...m$$

Then calculate the relative error of each sample $ $e _{ki}= \frac{|y_i-g_k (x_i) |} {e_k}$$

Here is the case when the error loss is linear, if we use squared error, then $e_{ki}= \frac{(Y_i-g_k (x_i)) ^2}{e_k^2}$, if we use exponential error, then $e_{ki}= 1-exp (\frac{-y_i + g_k (x_ i))}{e_k}) $

Finally get the error rate of the K-Weak learner $ $e _k = \sum\limits_{i=1}^{m}w_{ki}e_{ki}$$

Let's take a look at how to get the weak learner weight factor $\alpha$. Here are: $$\alpha_k =\frac{e_k}{1-e_k}$$

For update update sample weight D, the sample set weight factor for the k+1 weak learner is $ $w _{k+1,i} = \frac{w_{ki}}{z_k}\alpha_k^{1-e_{ki}}$$

Here $z_k$ is the normalization factor $ $Z _k = \sum\limits_{i=1}^{m}w_{ki}\alpha_k^{1-e_{ki}}$$

Finally, the combination strategy, as with the classification problem, is also weighted average method, the final strong regression is $ $f (x) = \sum\limits_{k=1}^{k} (Ln\frac{1}{\alpha_k}) g_k (x) $$

3. Loss function optimization for adaboost classification problem

In the previous section, we talked about the weak learner weight coefficient formula and the sample weight update formula for classification AdaBoost. But there is no explanation for the reason for choosing this formula, which makes people feel like a magic formula. In fact, it can be deduced from the loss function of AdaBoost.

From another point of view, AdaBoost is the model for the additive model, the learning algorithm is the forward step-by-step learning algorithm, the loss function is the classification problem of exponential function.

The model is well understood for the addition model, and our final strong classifier is obtained by weighted average of several weak classifiers.

Forward step-by-step learning algorithm and understanding, our algorithm is through a round of weak learner learning, using the results of the previous weak learner to update the training set weight of a weak learning device. In other words, the strong learner of the k-1 wheel is $ $f _{k-1} (x) = \sum\limits_{i=1}^{k-1}\alpha_ig_{i} (x) $$

The K-Wheel's strong learner is $ $f _{k} (x) = \sum\limits_{i=1}^{k}\alpha_ig_{i} (x) $$

The above two-type comparison can get $ $f _{k} (x) = f_{k-1} (x) + \alpha_kg_k (x) $$

It is true that the strong learner is actually obtained by stepping forward step by step learning algorithm.

AdaBoost loss function is exponential function, that is, the definition of loss function is $$\underbrace{arg\;min\;} _{\alpha, G} \sum\limits_{i=1}^{m}exp (-y_if_{k} (x)) $$

The loss function is $$ (\alpha_k, G_k (x)) = \underbrace{arg\;min\ by using the relationship of the forward step Learning algorithm;} _{\alpha, g}\sum\limits_{i=1}^{m}exp[(-y_i) (F_{k-1} (x) + \alpha G (x))]$$

Make $w_{ki}^{'} = exp (-y_if_{k-1} (x)) $, its value does not depend on $\alpha, g$, so it has nothing to do with minimizing, depends only on $f_{k-1} (x) $, and changes with each iteration.

The equation is brought into the loss function, and the loss function is converted to $$ (\alpha_k, G_k (x)) = \underbrace{arg\;min\;} _{\alpha, g}\sum\limits_{i=1}^{m}w_{ki}^{'}exp[-y_i\alpha G (x)]$$

First, we beg $g_k (x) $., you can get $ $G _k (x) = \underbrace{arg\;min\;} _{g}\sum\limits_{i=1}^{m}w_{ki}^{'}i (y_i \neq G (x_i)) $$

The $g_k (x) $ is brought into the loss function, and the derivative of $\alpha$, which equals 0, is $$\alpha_k = \frac{1}{2}log\frac{1-e_k}{e_k}$$

Among them, $e _k$ is the classification error rate in front of us. $ $e _k = \frac{\sum\limits_{i=1}^{m}w_{ki}^{'}i (y_i \neq G (x_i))}{\sum\limits_{i=1}^{m}w_{ki}^{'}} = \sum\limits_{i=1 }^{m}w_{ki}i (y_i \neq G (x_i)) $$

Finally, we look at the update of the sample weights. With $f_{k} (x) = f_{k-1} (x) + \alpha_kg_k (x) $ and $w_{ki}^{'} = exp (-y_if_{k-1} (x)) $, you can get: $ $w _{k+1,i}^{'} = w_{ki}^{'}exp[-y_i \alpha_kg_k (x)]$$

This gives us a sample weight update formula for our second section.

4. AdaBoost Two-dollar classification problem algorithm flow

Here we make a summary of the AdaBoost two-dollar classification problem algorithm flow.

Input as Sample set $t=\{(X_,y_1), (x_2,y_2), ... (x_m,y_m) \}$, Output is {-1, +1}, weak classifier algorithm, weak classifier iteration number K.

Output is the final strong classifier $f (x) $

1) Initialize the sample set weight to $ $D (1) = (w_{11}, w_{12}, ... w_{1m}); \;\; w_{1i}=\frac{1}{m};\;\; I =1,2...m$$

2) for k=1,2, ... K:

A) Use a sample set with weight $d_k$ to train the data to get a weak classifier $g_k (x) $

b) Calculate the classification error rate of $g_k (x) $ $e _k = P (G_k (x_i) \neq y_i) = \sum\limits_{i=1}^{m}w_{ki}i (G_k (x_i) \neq y_i) $$

c) Calculate the coefficient of weak classifier $$\alpha_k = \frac{1}{2}log\frac{1-e_k}{e_k}$$

d) Update the weight distribution of the sample set $ $w _{k+1,i} = \frac{w_{ki}}{z_k}exp (-\alpha_ky_ig_k (x_i)) \;\; I =1,2,... m$$

Here $z_k$ is the normalization factor $ $Z _k = \sum\limits_{i=1}^{m}w_{ki}exp (-\alpha_ky_ig_k (x_i)) $$

3) Build the final classifier to: $ $f (x) = sign (\sum\limits_{k=1}^{k}\alpha_kg_k (x)) $$

For AdaBoost multivariate classification algorithm, the principle is similar to the two-yuan classification, the most important difference is the coefficient of weak classifier. For example, the AdaBoost samme algorithm, its weak classifier coefficients $$\alpha_k = \frac{1}{2}log\frac{1-e_k}{e_k} + log (R-1) $$

where r is the number of categories. As can be seen from the above, if it is a two-yuan classification, r=2, then the formula and our two-dollar classification algorithm in the weak classifier of the same coefficient.

5. Algorithmic flow of adaboost regression problem

Here we make a summary of the adaboost regression problem algorithm flow. AdaBoost regression algorithm is a lot of variants, the following algorithm is AdaBoost R2 regression algorithm process.

Input as Sample set $t=\{(X_,y_1), (x_2,y_2), ... (X_m,y_m) \}$, weak learner algorithm, weak learner iteration number K.

Output for ultimate Strong learner $f (x) $

1) Initialize the sample set weight to $ $D (1) = (w_{11}, w_{12}, ... w_{1m}); \;\; w_{1i}=\frac{1}{m};\;\; I =1,2...m$$

2) for k=1,2, ... K:

A) Use a sample set with weight $d_k$ to train the data and get the weak learner $g_k (x) $

b) Calculate the maximum error on the training set $ $E _k= max|y_i-g_k (x_i) |\;i=1,2...m$$

c) Calculate the relative error of each sample:

If it is linear error, then $e_{ki}= \frac{|y_i-g_k (x_i) |} {e_k}$;

If it is squared error, then $e_{ki}= \frac{(Y_i-g_k (x_i)) ^2}{e_k^2}$

If it is an exponential error, then $e_{ki}= 1-exp (\frac{-y_i + g_k (x_i))}{e_k}) $

d) Calculate regression error Rate $ $e _k = \sum\limits_{i=1}^{m}w_{ki}e_{ki}$$

c) Calculating the coefficients of the weak learner $$\alpha_k =\frac{e_k}{1-e_k}$$

d) Update the sample set with a weight distribution of $ $w _{k+1,i} = \frac{w_{ki}}{z_k}\alpha_k^{1-e_{ki}}$$

Here $z_k$ is the normalization factor $ $Z _k = \sum\limits_{i=1}^{m}w_{ki}\alpha_k^{1-e_{ki}}$$

3) Build the final strong learner to: $ $f (x) = \sum\limits_{k=1}^{k} (Ln\frac{1}{\alpha_k}) g_k (x) $$

6. Regularization of the AdaBoost algorithm

To prevent AdaBoost from overfitting, we usually add a regularization term, which we often call the step size (learning rate). Defined as $\nu$, iterations for the previous weak learner $ $f _{k} (x) = f_{k-1} (x) + \alpha_kg_k (x) $$

If we add a regularization item, there is $ $f _{k} (x) = f_{k-1} (x) + \nu\alpha_kg_k (x) $$

The $\nu$ value range is $ < \NU \leq 1 $. For the same training set learning effect, the smaller $\nu$ means that we need more iterations of the weak learner. Usually we use the maximum number of steps and iterations to determine the fit effect of the algorithm.

7. AdaBoost Summary

Here AdaBoost is finished, the previous one did not mention, is the type of weak learner. Theoretically, any learner can be used for adaboost. But in general, the most widely used adaboost weak learners are decision trees and neural networks. For the decision tree, the AdaBoost classification uses the CART classification tree, and the AdaBoost regression uses the cart regression tree.

Here is a summary of the pros and cons of the adaboost algorithm.

The main advantages of AdaBoost are:

1) High classification accuracy when adaboost as a classifier

2) under the framework of adaboost, it is very flexible to use various regression classification models to construct weak learners.

3) as a simple two-element classifier, the structure is simple and the results are understandable.

4) not easy to fit

The main drawbacks of AdaBoost are:

1) sensitive to abnormal samples, abnormal samples may get higher weights in iteration, which will affect the prediction accuracy of the final strong learners.

(Welcome reprint, reproduced please indicate the source.) Welcome to communicate: [email protected])

Summary of AdaBoost algorithm principle of integrated learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.