AdaBoost algorithm and MATLAB implementation

Source: Internet
Author: User

I. Introduction of AdaBoost

Boosting, also known as enhanced learning or ascension, is an important integrated learning technique that can enhance the predictive accuracy of weak learners with a slightly higher predictive precision than random guessing, which is very difficult to construct strong learners, and provides an effective new idea and method for the design of learning algorithms. One of the most successful applications was the AdaBoost algorithm proposed by YOAV Freund and Robert Schapire in 1995.
AdaBoost is an abbreviation for the English "Adaptive boosting" (adaptive enhancement), which is adaptive to the fact that the weights of the samples that were incorrectly categorized by the previous basic classifier will increase, while the weights of the correctly categorized samples will be reduced and again used to train the next basic classifier. Also, in each iteration, a new weak classifier is added to determine the final strong classifier until a predetermined small enough error rate is reached or a predetermined maximum number of iterations is reached.
The adaboost algorithm can be described in three steps:
(1) First, it is the weight distribution D1 of the initial training data. Assuming that there are N training sample data, each training sample is given the same weight at the very beginning: w1=1/n.
(2) Then, train the weak classifier hi. The specific training process is: If a training sample point, by the weak classifier Hi accurate classification, then in the construction of the next training set, its corresponding weight to reduce; Conversely, if a training sample point is incorrectly categorized, then its weight should be increased. The set of weights that have been updated is used to train the next classifier, and the entire training process goes on so iteratively.
(3) Finally, the weak classifiers of each training are combined into a strong classifier. After the training process of each weak classifier is finished, the weight of the weak classifier with small classification error rate is enlarged, which plays a larger role in the final classification function, while the weight of the weak classifier with large classification error rate is reduced, which plays a smaller role in the final classification function.
In other words, the weak classifier with low error rate occupies a larger weight in the final classifier, otherwise it is smaller.

Second, AdaBoost algorithm process

Given training dataset: A category label that represents a training sample,i=1,..., N. The purpose of AdaBoost is to learn a series of weak classifiers or basic classifiers from the training data, and then combine these weak classifiers into a strong classifier.

Related symbol Definitions:

The AdaBoost algorithm flow is as follows:

Related instructions:

Combining the above derivation, when the sample is divided and divided, the formula for updating the weights is:

Three, adaboost example explanation

Example: Given the training sample, the weak classifier uses the straight line parallel to the axis, and uses the AdaBoost algorithm to realize the strong classification process.

Data analysis:

Using these 10 samples as training data, according to the correspondence between X and Y , the 10 data can be divided into two categories, with "+" for Category 1 and "O" for category-1. In this example, a horizontal or vertical line is used as the classifier, and three weak classifiers have been given, namely:

Initialization

First, the weight distribution of the training sample data needs to be initialized, and each training sample is given the same weight at the very beginning:wi=1/N, so the initial weight distribution of the training sample set is D1 (i):

So that each weight value w1i = 1/n = 0.1, wheren = ten,i = 1, 2, ..., 10, and then respectively for t= ... etc. Values are iterated (t represents the number of iterations, which represents the T -wheel), and the following table has given the weights distribution of the training samples:

1th Iteration T= 1:

The first initial weight distribution D1 is 1/n (10 data, each data weights are initialized to 0.1),

d1=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]

In the case of weight distribution D1 , the classifier with the lowest error rate in the known three weakly classifiers H1,H2 and H3 is used as the 1th basic classifier H1 ( x) (the error rate of three weak classifiers is 0.3, then take the 1th one.)

In the case of classifier H1 (x) =H1, the sample point "5 7 8" is divided so that the error rate of the basic classifier H1 (x) is:

It can be seen that the sum of the weights of the sampled samples is affected by the error rate e, and the error rate e affects the weight αof the basic classifier in the final classifier.

Then, update the weights distribution for the training sample data for the next iteration, and update the weights for the correctly categorized training sample "1 2 3 4 6 9 10" (Total 7) to:

In this way, after the 1th iteration, the new weights distributions of each sample data are finally obtained:

D2=[1/14,1/14,1/14,1/14,1/6,1/14,1/6,1/6,1/14,1/14]

Since the sample data "5 7 8" are divided by H1 (x), their weights are increased from the previous 0.1 to 1/6, whereas the other data are correctly divided so that their weights are reduced from the previous 0.1 to 1/14, the following table gives the transformation of the weight distribution:

Available classification functions:F1 (x) = α1H1 (x) = 0.4236h1 (x). At this point, combining a basic classifier sign (F1(x)) as a strong classifier has 3 false classification points (i.e. 5 7 8) on the training data set, at which point the training error of the strong classifier is: 0.3

Obviously,H2 (x) divided the sample "3 4 6", according to D2 They have a weight of D2 (3) =1/14,D2 (4) =1/14, D2 (6) =1/14, so the error rate of H2 (x) on the training data set:

In this way, after the 2nd iteration, the new weights distributions of each sample data are finally obtained:

D3=[1/22,1/22,1/6,1/6,7/66,1/6,7/66,7/66,1/22,1/22]

The following table shows the transformation of the weight distribution:

Available classification functions:F2 (x) =0.4236H1 (x) + 0.6496H2 (x). At this point, the combination of two basic classifier sign (F2 (x)) as a strong classifier on the training data set has 3 false classification points (that is, 3 4 6), at this time the strong classifier training error is: 0.3

Therefore, take the current smallest classifier H3 as the 3rd basic classifier H3 (x):

Thus, after the 3rd iteration, the new weights of each sample data are distributed as follows:

D4=[1/6,1/6,11/114,11/114,7/114,11/114,7/114,7/114,1/6,1/38]

The following table shows the transformation of the weight distribution:

Available classification functions:F3 (x) =0.4236H1 (x) + 0.6496H2 (x) +0.9229 H3 (x). At this point, the combination of three basic classifier sign (F3 (x)) as a strong classifier, on the training data set has 0 false classification points. At this point, the entire training process is over.

To integrate all classifiers, the final strong classifier is:

This strong classifier hfinal the error rate of training samples to 0!

This example matlab code, as follows:

First set up Matlab function file, define H1,H2 and H3 three weak classifiers

function kind = wcH1 (x,th)  %H1 weak classifier  X1=x (1);  X2=x (2);   if x1<TH      kind=1;   Else       Kind=-1;  End  End  
 function  kind = wcH2 (x,th) %h2 weak classifier X1  =x (1 =x (2 if  x1<th kind  =1;  else   kind  =-1; End End  
function kind = wcH3 (x,th)  %h3 weak classifier  X1=x (1);  X2=x (2);   if x2<TH      kind=-1;   Else       Kind=1;  End  End  

Main program matlab code:

clc,clear All; %%Training Sample Data XData=[1 5;2 2;3 1;4 6;6 8;6 5;7 9;8 7;9 8;10 2]% sample data points, corresponding numbers are,... 10Y=[1 1-1-1 1-1 1 1-1-1] ';                                                      % corresponding to the sample category, with 1 and 1 to denote xnum=1:10;   % number format Rat percent draw sample distribution map L1=find (y==1);  X=xdata (l1,1); Y=xdata (l1,2); Plot (x, Y,' B + ', ' linewidth ', 3, ' markersize ', 12);   Hold on;  L2=find (Y==-1);  X=xdata (l2,1); Y=xdata (l2,2); Plot (x, Y,' Ro ', ' linewidth ', 3, ' markersize ', 12); Xlabel (' X1 '); Ylabel (' X2 ')); axis ([0 10 0 10]) percent *********************************** preliminary process ************************************ H1=zeros (10,1); H2=H1;      H3=h1 for I=1:10 x=xdata (i,:); H1 (i) = WcH1 (x,2.5);% weak classifier H1 H2 (i) = WcH2 (x,8.5);% weak classifier H2 H3 (i) = WcH3 (x,6.5);% weak classifier H3 end Errdatah1=find (H 1~=y);% found the ordinal of the sample point H1 by the error errdatah2=find (h2~=y);% find the H2 of the sample point of the wrong sub-points errdatah3=find (h3~=y);% found the ordinal of the sample point by H3 error Accdatah1=find ( h1==y);% find the ordinal accdatah2=find (h2==y) of the sample points correctly divided by the H1,% find the ordinal H2 (Accdatah3=find) of the sample points that were h3==y correctly divided by the H3;  [ERRDATAH1,ERRDATAH2,ERRDATAH3];    ACCDATAALL=[ACCDATAH1,ACCDATAH2,ACCDATAH3];  n=10; D1=zeros (10,1) +1/n% initialization weight distribution percent *********************************** first iteration *********************************** err1=s Um (D1 (errDataH1,:));% the sum of the weights of all the sampled points is the error rate Err2=sum (D1 (errDataH2,:)); The sum of the weights of all the sample points of the wrong classification is the error rate Err3=sum (D1 (errDataH3,  :)); The sum of the weights of all the sampled points of the wrong classification is the error rate ERRALL=[ERR1,ERR2,ERR3];  [Minerr,minindex]=min (Errall); % calculates the coefficient of H1 according to the error rate E1: A1=0.5*log ((1-minerr)/minerr) Minerrdata=errdataall (:,Minindex);  Minaccdata=accdataall (:, Minindex);  D2=D1; For I=minaccdata‘D2 (i)=D2 (i)/(1-MINERR);End forI=minerrdata 'D2 (i) =d2 (i)/(2*MINERR);  End D2% classification function f1=a1.*h1;  Kindfinal=sign (F1)% This time the classification result of the strong classifier is *********************************** the second iteration *********************************** Err1=sum (D2 (errDataH1,:)); The sum of the weights of all the sampled points of the wrong classification is the error rate Err2=sum (D2 (errDataH2,:)); The sum of the weights of all the sampled points of the wrong classification is the error rate Err3=sum (D2 (  ErrDataH3,:));% the sum of the weights of all the sampled points of the wrong classification is the error rate ERRALL=[ERR1,ERR2,ERR3];  [Minerr,minindex]=min (Errall);  % calculates the coefficient of H2 according to the error rate E2: A2=0.5*log ((1-minerr)/minerr) Minerrdata=errdataall (:, Minindex);  Minaccdata=accdataall (:, Minindex);  D3=D2; For I=minaccdata‘D3 (i)=D3 (i)/(1-MINERR);End forI=minerrdata 'D3 (i) =d3 (i)/(2*MINERR);  End D3% classification function f2=a1.*h1+a2*h2;  Kindfinal=sign (F2)% This time the classification result of the strong classifier is *********************************** the third iteration *********************************** Err1=sum (D3 (errDataH1,:)); The sum of the weights of all the sampled points of the wrong classification is the error rate Err2=sum (D3 (errDataH2,:)); The sum of the weights of all the sampled points of the wrong classification is the error rate Err3=sum (D3 (  ErrDataH3,:));% the sum of the weights of all the sampled points of the wrong classification is the error rate ERRALL=[ERR1,ERR2,ERR3];  [Minerr,minindex]=min (Errall);  % calculates the coefficient of G3 according to the error rate E3: A3=0.5*log ((1-minerr)/minerr) Minerrdata=errdataall (:, Minindex);  Minaccdata=accdataall (:, Minindex);  D4=D3; For I=minaccdata‘D4 (i)=D4 (i)/(1-MINERR);End forI=minerrdata 'D4 (i) =d4 (i)/(2*MINERR);  End D4% classification function f3=a1.*h1+a2*h2+a3*h3; Kindfinal=sign (F3)% at this time the classification result of the strong classifier is

Iv. advantages and disadvantages of AdaBoost

Advantages

(1) AdaBoost provides a framework within which a sub-classifier can be constructed using a variety of methods. You can use a simple weak classifier without filtering the feature or having an over fit phenomenon.

(2) The AdaBoost algorithm does not need prior knowledge of the weak classifier, and the classification precision of the strong classifier depends on all weak classifiers. Whether applied to artificial data or real data, adaboost can significantly improve the learning accuracy.

(3) AdaBoost algorithm does not need to know in advance the error rate limit of weak classifier, and the classification precision of the last obtained strong classifier depends on the classification precision of all weak classifiers, which can dig the ability of the classifier deeply. AdaBoost can adjust the assumed error rate adaptively according to the feedback of weak classifier, and the execution efficiency is high.

(4) AdaBoost to the same training sample set training different weak classifiers, according to a certain method to set up these weak classifiers, the construction of a strong classification ability of the classifier, that is, "Three Stooges race a Zhuge Liang."

Disadvantages:

During the course of AdaBoost training, AdaBoost will increase the weights of difficult-to-classify samples, and the training will be biased towards such difficult samples, which results in the adaboost algorithm being susceptible to noise disturbance. In addition, AdaBoost relies on weak classifiers, while the training time for weak classifiers is often very long.

Reprinted from 70995333

AdaBoost algorithm and MATLAB implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.