Machine learning (using AdaBoost meta-algorithm to improve classification performance)

Last Update:2018-05-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The idea behind the meta-algorithm is a way to combine other algorithms, a

 fromNumPyImport*defloadsimpdata (): Datmat= Matrix ([[1., 2.1],        [ 2., 1.1],        [ 1.3, 1. ],        [ 1., 1. ],        [ 2., 1. ]]) Classlabels= [1.0, 1.0,-1.0,-1.0, 1.0]    returnDatmat,classlabelsdefLoaddataset (FileName):#General function to parse tab-delimited floatsNumfeat = Len (open (FileName). ReadLine (). Split ('\ t'))#get number of fieldsDatamat = []; Labelmat =[] FR=Open (FileName) forLineinchfr.readlines (): Linearr=[] CurLine= Line.strip (). Split ('\ t')         forIinchRange (numFeat-1): Linearr.append (float (curline[i)) datamat.append (Linearr) labelmat.append (float (curline[< /c10>-1]))    returnDatamat,labelmatdefStumpclassify (DATAMATRIX,DIMEN,THRESHVAL,THRESHINEQ):#just classify the dataRetarray = ones (Shape (datamatrix) [0],1))    ifThreshineq = ='LT': Retarray[datamatrix[:,dimen]<= Threshval] = 1.0Else: Retarray[datamatrix[:,dimen]> Threshval] = 1.0returnRetarraydefBuildstump (dataarr,classlabels,d): Datamatrix= Mat (Dataarr); Labelmat =Mat (classlabels). T M,n=shape (datamatrix) numsteps= 10.0; Beststump = {}; Bestclasest = Mat (Zeros (m,1)) Minerror= inf#init error sum, to +infinity     forIinchRange (N):#loop over all dimensionsRangeMin = Datamatrix[:,i].min (); RangeMax =Datamatrix[:,i].max (); Stepsize= (rangemax-rangemin)/numsteps forJinchRange ( -1,int (numsteps) +1):#Loop over all range in current dimension             forInequalinch['LT','GT']:#go over less than and greater thanThreshval = (rangemin + float (j) *stepsize) Predictedvals= Stumpclassify (datamatrix,i,threshval,inequal)#Call stump classify with I, J, LessThanErrarr = Mat (Ones (m,1)) Errarr[predictedvals= = Labelmat] =0 Weightederror= D.t*errarr#Calc Total error multiplied by D                #print "Split:dim%d, Thresh%.2f, Thresh ineqal:%s, the weighted error is%.3f"% (I, Threshval, inequal, Weighteder ROR)                ifWeightederror <Minerror:minerror=Weightederror bestclasest=predictedvals.copy () beststump['Dim'] =I beststump['Thresh'] =Threshval beststump['Ineq'] =inequalreturnbeststump,minerror,bestclasestdefAdaboosttrainds (dataarr,classlabels,numit=40): Weakclassarr=[] M=shape (Dataarr) [0] D= Mat (Ones ((m,1))/m)#Init D to all equalAggclassest = Mat (Zeros (m,1)))     forIinchRange (numit): Beststump,error,classest= Buildstump (dataarr,classlabels,d)#Build Stump        #print "D:", d.tAlpha = float (0.5*log (1.0-error)/max (error,1e-16))#Calc Alpha, throw in Max (error,eps) to account for error=0beststump['Alpha'] =Alpha Weakclassarr.append (beststump)#store Stump Params in Array        #print "Classest:", classest.tExpon = Multiply ( -1*alpha*mat (classlabels). T,classest)#exponent for D Calc, getting messyD = Multiply (D,exp (expon))#Calc New D for next iterationD = d/d.sum ()#Calc Training error of all classifiers, if this is 0 quit for loop early ( use break)Aggclassest + = alpha*classest#print "Aggclassest:", aggclassest.tAggerrors = Multiply (sign (aggclassest)! = Mat (classlabels). T,ones ((m,1)) Errorrate= Aggerrors.sum ()/mPrint "Total Error:", ErrorrateifErrorrate = = 0.0: Break    returnweakclassarr,aggclassestdefadaclassify (Dattoclass,classifierarr): Datamatrix= Mat (Dattoclass)#Do stuff similar to last aggclassest in Adaboosttraindsm =shape (Datamatrix) [0] Aggclassest= Mat (Zeros (m,1)))     forIinchRange (len (Classifierarr)): Classest= Stumpclassify (datamatrix,classifierarr[i]['Dim'], classifierarr[i]['Thresh'], classifierarr[i]['Ineq'])#Call Stump classifyAggclassest + = classifierarr[i]['Alpha']*classestPrintaggclassestreturnSign (aggclassest)defPlotroc (Predstrengths, classlabels):ImportMatplotlib.pyplot as PLT cur= (1.0,1.0)#cursorYsum = 0.0#variable to calculate AUCNumposclas = SUM (Array (classlabels) ==1.0) Ystep= 1/float (Numposclas); XStep = 1/float (len (classlabels)-Numposclas) sortedindicies= Predstrengths.argsort ()#get sorted index, it ' s reverseFig =plt.figure () FIG.CLF () Ax= Plt.subplot (111)    #Loop through all the values, drawing a line segment at each point     forIndexinchsortedindicies.tolist () [0]:ifClasslabels[index] = = 1.0: Delx= 0; Dely =Ystep; Else: Delx= XStep; Dely =0; Ysum+ = Cur[1]        #draw line from cur to (cur[0]-delx,cur[1]-dely)Ax.plot ([cur[0],cur[0]-delx],[cur[1],cur[1]-dely], c='b') cur= (cur[0]-delx,cur[1]-dely) Ax.plot ([0,1],[0,1],'b--') Plt.xlabel ('False Positive rate'); Plt.ylabel ('True Positive rate') Plt.title ('ROC curve for AdaBoost horse colic detection system') Ax.axis ([0,1,0,1]) plt.show ()Print "The area under the Curve is:", Ysum*xstep

Daboost is the most popular meta-algorithm and one of the most powerful tools in machine learning.

The combination of different algorithms can also be the same algorithm in different settings of the integration, can also be different parts of the data set assigned to different classifiers after the integration

Advantages: Low generalization error rate, easy coding, can be applied to most of the classifier, no parameters need to adjust

Cons: Sensitive to outliers.

Suitable for numerical nominal-scale data

Bagging is the technique of selecting S from the original dataset to get s new datasets, the new datasets are equal to the original dataset size, each dataset is randomly selected from the original dataset to be replaced by a sample, which allows the selection of duplicate values, while some values may not appear

After the S data is built, an algorithm is used for each data set to get the S classifier, when we classify the new data, we can use this s classifier to classify, select the classifier poll results of the most results as the final classification results

The more advanced bagging method is the random forest

Boosting is a technology similar to bagging, bagging is obtained through serial training, while boosting focuses on the part of the data that has been incorrectly divided by the classifier to obtain a new classifier.

The result of boosting is the result of weighted summation of all classifiers, bagging are equal weights, boosting weights are different, each weight represents the success of the classifier in the previous iteration

AdaBoost is one of the boosting.

The adaboost algorithm can be described in three steps:
(1) First, it is the weight distribution D1 of the initial training data. Assuming that there are N training sample data, each training sample is given the same weight at the very beginning: w1=1/n.
(2) Then, train the weak classifier hi. The specific training process is: If a training sample point, by the weak classifier Hi accurate classification, then in the construction of the next training set, its corresponding weight to reduce; Conversely, if a training sample point is incorrectly categorized, then its weight should be increased. The set of weights that have been updated is used to train the next classifier, and the entire training process goes on so iteratively.
(3) Finally, the weak classifiers of each training are combined into a strong classifier. After the training process of each weak classifier is finished, the weight of the weak classifier with small classification error rate is enlarged, which plays a larger role in the final classification function, while the weight of the weak classifier with large classification error rate is reduced, which plays a smaller role in the final classification function.
In other words, the weak classifier with low error rate occupies a larger weight in the final classifier, otherwise it is smaller.

Machine learning (using AdaBoost meta-algorithm to improve classification performance)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine learning (using AdaBoost meta-algorithm to improve classification performance)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine learning (using AdaBoost meta-algorithm to improve classification performance)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support