**Citation (machine learning Combat)**

**Simple concept**

AdaBoost is a weak learning algorithm to strong learning algorithm, here the weak and strong learning algorithm, refers to the classifier is of course, first we need to briefly introduce a few concepts.

1: Weak learner: In the second case, the error rate of the weak classifier will be lower than 50%. In fact, arbitrary classifiers can be used as weak classifiers, such as the previous introduction of KNN, decision Tree, Naïve Bayes, logiostic regression and SVM can be. The weak classifier we use here is a single-layer decision tree, which is a single-node decision tree. It is the most popular weak classifier in AdaBoost, and of course not the only weak classifier available. That is, select a feature from the feature to classify, this feature can make the error rate to a minimum, note that the error rate here is the weighted error rate, for the wrong sub-sample (1) and the weight of the sample of the sum (do not understand the following code).

The definition of more stringent points:

Strong learning: A concept If there is a polynomial learning algorithm can learn it, and the correct rate is very high, then, the concept is strong learning; Weak learning: a concept If there is a polynomial learning algorithm can learn it, and the correct rate of learning is only slightly better than random guessing (above 50%), then, This concept is weak and can be learned, and strong learning is equivalent to weak learning. And the theory proves that several weakly learning classifiers can be promoted to strong learning classifiers by linear superposition.

2: Strong learner: A learning algorithm that recognizes high accuracy and can be completed in polynomial time

3: Integration method: is to combine different classifiers together, the result of this combination is the integration method or called meta-algorithm. It can be the integration of different algorithms, or the integration of the same algorithm under different settings, or the integration of different parts of the dataset assigned to different classifiers.

4: The usual integration method has the bagging method and the boosting method, AdaBoost is the boosting representative algorithm, their difference is: Bagging method (bootstrapaggregating), Chinese is the bootstrap aggregation method, is to re-select from the original dataset (there is put back, can be repeated) get s a new data set of a technology, each new dataset sample number and the original dataset sample number is equal, so you can get s classifier, and then the S classifier overlay, their weights are equal ( Of course, the S classifier uses a different classification algorithm, you can consider using voting, so you can get a strong learning device.

The boosting method is to obtain a new classifier by focusing on the data that has been wrongly divided by the existing classifier, and the weights in the boosting algorithm are not equal, and each weight represents the success of its corresponding classifier in the previous iteration. Here is not too clear words, after reading the adaboost algorithm will understand the meaning of this sentence.

5:adaboost algorithm: Machine learning Combat the description in this book is consistent with the description in the paper, see below (the first two are described in the book, the latter is described in the paper).

**The AdaBoost runs as follows: Each sample in the training data is given a weight that forms the vector d, at the beginning, these weights are initialized to the equivalent of the phase.**

**First, a weak classifier is trained in the training data and the error rate of the classifier is calculated, then the weak classifier is trained again on the same data set. In the second session of the classifier**

**, the weights for each sample will be re-adjusted, and the weight of the first-pair sample will be reduced, and the first-time-divided sample weights will be increased. In order to get from all weak classifiers**

**To get the final classification result, AdaBoost assigns a weight value alpha to each classifier, which is calculated based on the error rate of each weak classifier.**

Where the error rate is defined as

Alpha Calculation formula

Question: Why is alpha in the above question the formula? The explanation is:-----again see not understand!!

(1) Constructing weak classifier based on single-layer decision tree

defLoadsimdat (): Datamat= Matrix ([[1, 2.1], [2.0, 1.1], [1.3, 1.0], [1.0, 1.0], [2.0, 1.0]]) classlabels= [1.0, 1.0,-1.0,-1.0, 1.0] returnDatamat, Classlabels#single-layer decision tree generation functiondefStumpclassify (DATAMATRIX,DIMEN,THRESHVAL,THRESHINEQ):#just classify the dataRetarray = ones (Shape (datamatrix) [0],1)) ifThreshineq = ='LT': Retarray[datamatrix[:,dimen]<= Threshval] = 1.0Else: Retarray[datamatrix[:,dimen]> Threshval] = 1.0returnRetarraydefbuildstump (Dataarr, Classlabels, D): Datamatrix= Mat (Dataarr); Labelmat =Mat (classlabels). T M,n=shape (datamatrix) numsteps= 10.0; Beststump = {}; Bestclassest = Mat (Zeros (m,1)) Minerror=inf forIinchrange (N): RangeMin= Datamatrix[:,i].min (); RangeMax =Datamatrix[:,i].max (); Stepsize= (rangemax-rangemin)/numsteps forJinchRange ( -1, int (numsteps) +1): forInequalinch['LT','GT']: Threshval= (RangeMin + float (j) *stepsize) Predictedvals=stumpclassify (Datamatrix, I, Threshval, inequal) Errarr= Mat (Ones (m,1)) Errarr[predictedvals= = Labelmat] =0 Weightederror= d.t * Errarr#the error here is that the wrong vector errarr and the corresponding elements of the weight vector d are multiplied to get the weighted error rate. #print "Split:dim%d, Thresh%.2f, Thresh inequal:%s, the weighted error is%.3f"% (I, Threshval, inequal, Weighteder ROR) ifWeightederror <Minerror:minerror=Weightederror bestclassest=predictedvals.copy () beststump['Dim'] =I beststump['Thresh'] =Threshval beststump['Ineq'] =inequalreturnBeststump, Minerror, bestclassest

Note that there are three layers of loops to construct a single-layer decision tree, the outermost loop is the traversal feature, the second outer loop is the traversal step, the inner layer is greater than or less than the threshold. **the minimum error rate for construction is the weighted error rate, which is why the weight of the split sample is increased, because the weight of the split sample increases, and the next time you continue to divide, the weighted error rate will be large, which does not satisfy the algorithm minimization weighted error rate. In addition, the weighted error rate must be reduced successively during each iteration. **

(2) Complete AdaBoost algorithm

defAdaboosttrainds (Dataarr, classlabels, Numit = 40): Weakclassarr=[] M=shape (Dataarr) [0] D= Mat (Ones (m,1))/m) aggclassest= Mat (Zeros (m,1))) forIinchRange (numit): Beststump, error, Classest=buildstump (Dataarr, Classlabels, D)#print "D:", d.tAlpha = float (0.5 * log ((1.0-error)/max (Error, 1e-16)))#ensure that no error occurs except for 0 overflowbeststump['Alpha'] =Alpha Weakclassarr.append (beststump)#print "Classest:", classest.tExpon = Multiply ( -1 * alpha * MAT (classlabels). T, Classest)#multiplication is used to distinguish between correct or incorrect samplesD =Multiply (d, exp (expon)) D= D/d.sum ()#normalized.Aggclassest + = Alpha * Classest#accumulate into strong classifiersAggerrors = Multiply (sign (aggclassest)! = Mat (classlabels). T, Ones (m,1)) Errorrate= Aggerrors.sum ()/mPrint "Total Error:", Errorrate,"\ n" ifErrorrate = = 0.0: Break returnWeakclassarr, Aggclassest

Note A trick in the code, MAX (ERROR,LE-16), is to ensure that no error occurs except for a 0 overflow.

(3) Test algorithm: Classification based on AdaBoost

#AdaBoost Classification Functionsdefadaclassify (Dattoclass, Classifierarr): Datamatrix=Mat (dattoclass) m=shape (Datamatrix) [0] Aggclassest= Mat (Zeros (m,1))) forIinchRange (len (Classifierarr)): Classest= Stumpclassify (Datamatrix, classifierarr[i]['Dim'], classifierarr[i]['Thresh'], classifierarr[i]['Ineq']) Aggclassest+ = classifierarr[i]['Alpha']*classestPrintaggclassestreturnSign (aggclassest)

In addition, how to theoretically prove that the weak learning classifier can be promoted to a strong learning classifier by linear superposition, there are two proof methods: (1) through the error upper bounds, (2) through the AdaBoost loss function (exponential loss function) to prove. The derivation looks at: http://blog.csdn.net/v_july_v/article/details/40718799, in which I only see the derivation of the upper bounds of error.

The conclusion of the upper error indicates that the training error of adaboost is decreased at exponential rate. In addition, the AdaBoost algorithm does not need to know in advance that the Nether γ,adaboost is self-adaptive, and it can adapt to the training error rate of the weak classifier.

**As long as the error rate of the weak classifier is less than 0.5, each time e^ (2mr^2)) must be continuously reduced**

AdaBoost based on Error lifting classifier