Improving classification performance by using AdaBoost meta-algorithm

Source: Internet
Author: User

As an important decision, you may be able to draw on more than one expert and not just a single person's opinion. Is it so when machine learning deals with problems? This is the idea behind the meta-algorithm . Meta-algorithms are a way to combine other algorithms.

The bootstrap aggregation method (bootstrap aggregating), also known as the bagging method, is a technique for obtaining s new datasets after selecting S from the original data set. The new dataset and the original dataset are of equal size. Each dataset is replaced by randomly selecting a sample in the original data set. After the S data set is built, a learning algorithm is used for each dataset to get the S classifier. When we want to classify new data, we can use this s classifier to classify. At the same time, select the largest category in the classifier poll result as the final classification result.

boosting is a technology similar to bagging, and the types of multiple classifiers used are consistent. In boosting technology, different classifiers are obtained through serial training, each new classifier is trained according to the performance of the trained classifier, and the boosting is to obtain a new classifier by focusing on what data has been wrongly divided by the existing classifier.

AdaBoost is one of the most popular meta-algorithms in the boosting method.

AdaBoost algorithm:

Advantages: The pan-China error rate is low, easy to encode, can be applied to most of the classifier, no parameter adjustment.

Cons: Sensitive to outliers.

Applicable data types: numeric and nominal data.

AdaBoost is the abbreviation for adaptive boosting (adaptive boosting), which runs as follows:

Train each sample in the data and give it a weight that makes up vector d. At first, these weights are initialized to the equivalent of a phase. First, a weak classifier is trained on the training data and the error rate of the classifier is calculated, then the weak classifier is trained again on the same data set. In the second training of the classifier, the weights of each sample will be re-adjusted, and the weight of the first-pair sample will be reduced, and the weight of the first-split sample will be increased. To get the final classification result from all weak classifiers, AdaBoost assigns a weight value alpha to each classifier, which is calculated based on the error rate of each weak classifier.

Error Rate = (number of samples not properly categorized)/(number of samples)

Alpha = the * ln (1-Error rate)/error rate)

Weight Update formula:

If a sample is correctly categorized, the weight of the sample is changed to:

If a sample is incorrectly categorized, the weight of the sample is changed to:

After calculating D, AdaBoost begins the next iteration. The AdaBoost algorithm constantly repeats the process of training and adjusting weights until the training error rate is 0 or the number of weak classifiers reaches the specified value.

based on a single-layer decision tree to construct weak classifiers , a single-layer decision tree is a simple decision tree that makes decisions based on a single feature, with only one split process. By using multiple single-layer decision trees, you can build a classifier that correctly classifies the data set.

Single-layer decision tree algorithm pseudo-code:

Set the minimum error rate minerror to +∞ for each feature of the dataset (the first loop) for each step (the second loop) for each of the equal (third-tier loops): Create a single-layer decision tree and test it with a weighted dataset if the error rate is less than minerror, the current Single-layer decision tree is set as optimal single-layer decision tree to return to the best single-layer decision tree

AdaBoost algorithm pseudo-code:

For each iteration: Find the best single-layer decision tree to add the best single-layer decision tree to the single-layer decision tree array to calculate the new weight vector D Update Cumulative category estimate if the error rate is 0.0, exit the loop

Single-layer decision tree algorithm code implementation:

defStumpclassify (DATAMATRIX,DIMEN,THRESHVAL,THRESHINEQ):#just classify the dataRetarray = ones (Shape (datamatrix) [0],1))    ifThreshineq = ='LT': Retarray[datamatrix[:,dimen]<= Threshval] = 1.0Else: Retarray[datamatrix[:,dimen]> Threshval] = 1.0returnRetarraydefBuildstump (dataarr,classlabels,d): Datamatrix= Mat (Dataarr); Labelmat =Mat (classlabels). T M,n=shape (datamatrix) numsteps= 10.0; Beststump = {}; Bestclasest = Mat (Zeros (m,1)) Minerror= inf#init error sum, to +infinity     forIinchRange (N):#loop over all dimensionsRangeMin = Datamatrix[:,i].min (); RangeMax =Datamatrix[:,i].max (); Stepsize= (rangemax-rangemin)/numsteps forJinchRange ( -1,int (numsteps) +1):#Loop over all range in current dimension             forInequalinch['LT','GT']:#go over less than and greater thanThreshval = (rangemin + float (j) *stepsize) Predictedvals= Stumpclassify (datamatrix,i,threshval,inequal)#Call stump classify with I, J, LessThanErrarr = Mat (Ones (m,1)) Errarr[predictedvals= = Labelmat] =0 Weightederror= D.t*errarr#Calc Total error multiplied by D                #print "Split:dim%d, Thresh%.2f, Thresh ineqal:%s, the weighted error is%.3f"% (I, Threshval, inequal, Weighteder ROR)                ifWeightederror <Minerror:minerror=Weightederror bestclasest=predictedvals.copy () beststump['Dim'] =I beststump['Thresh'] =Threshval beststump['Ineq'] =inequalreturnBeststump,minerror,bestclasest

AdaBoost Algorithm Code implementation:

defAdaboosttrainds (Dataarr, Classlabels, numit=40): Weakclassarr=[] M=shape (Dataarr) [0] D= Mat (Ones (m,1))/m) aggclassest= Mat (Zeros (m,1)))     forIinchRange (numit): Beststump, error, Classest=buildstump (dataarr,classlabels,d)Print "D:", d.t Alpha= Float (0.5*log (1.0-error)/max (Error, 1e-16)) beststump['Alpha'] =Alpha Weakclassarr.append (beststump)Print "classest:", classest.t expon= Multiply ( -1*alpha*Mat (classlabels). T, classest) D=Multiply (d, exp (expon)) D= d/d.sum () aggclassest+ = alpha*classestPrint "aggclassest:", aggclassest.t aggerrors= Multiply (aggclassest)!=mat (classlabels). T, Ones (m,1)) Errorrate= Aggerrors.sum ()/mPrint "Total Error:", Errorrate,"\ n"        ifErrorrate = = 0.0:             Break    returnWeakclassarr

  

Improving classification performance by using AdaBoost meta-algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.