Simple and easy to learn machine learning algorithm--adaboost

Source: Internet
Author: User
Tags id3

first, the integration method(Ensemble Method)The integration approach mainly includesBaggingand theboostingtwo methods,the random forest algorithm is based onBaggingthe idea of machine learning algorithms,in theBaggingin this method, the training data sets are sampled randomly to regroup different datasets, the weak learning algorithm is used to study different new datasets, and a series of prediction results are obtained, and the results are averaged or voted to make the final prediction. AdaBoostAlgorithms andGBDT (Gradient Boost decision Tree, gradient boost decision tree)algorithm is based onboostingThinking of machine learning algorithms. In theboostingThe thought is through the different assignment of the sample, the weight of the wrong learning sample to set the larger, so that in the follow-up study focused on difficult to learn the sample, the final result of a series of predictions, each prediction results have a weight, the larger weight of the prediction effect is better, detailed ideas can be seen Bowen " Easy-to-learn machine learning algorithms-integration Methods (Ensemble method)". Second,AdaBoost algorithm thought     adaboost boosting thought of the machine learning algorithm, where adaboost Yes adaptive boosting adaboost is an iterative algorithm, The core idea is to train different learning algorithms for the same training set, that is, weak learning algorithms, and then set up these weak learning algorithms to construct a stronger final learning algorithm. in order to construct a strong learning algorithm, we need to select a weak learning algorithm and train the weak learning algorithm by using the same training set to improve the performance of the weak learning algorithm. In the AdaBoost algorithm, there are two weights, the first number in the training set each sample has a weight, called the sample weight, expressed in vector; the other is that each weak learning algorithm has a weight, expressed as a vector. Suppose there is a sample training set, initially, set the weight of each sample is equal, that is, using the first weak learning algorithm to learn it, learning to complete the error rate after the statistics:


which represents the number of samples that were incorrectly categorized, representing the number of all samples. This allows you to calculate the weight of the weak learning algorithm using the error rate:


After the first study is completed, the weights of the samples need to be re-adjusted so that the weights of the samples that have been wrongly divided in the first category can be emphasized in the following studies:


among them, the training of the first sample is correct, indicating that the first sample training error. is a normalization factor:


This second study, when the learning wheel, got a weak learning algorithm and its weight. For the new classification data, the output of a weak classifier is calculated separately, and the output of the final AdaBoost algorithm is:


where is the symbolic function. The process can be seen as follows:
(image from Reference Document 1)
Three,AdaBoost algorithm flow The above is the basic principle of AdaBoost, the following gives the flow of the AdaBoost algorithm:
(from reference 2)
Iv. Practical Examples   AdaBoostalgorithm is a kind of classifier with high accuracy, in factAdaBoostThe algorithm provides a framework in which we can use different weak classifiers toAdaBoostThe framework builds strong classifiers. Below we use a single-layer decision tree to build a classifier that handles the following classification problems:
The decision tree algorithm mainly hasID3,C4.5and theCART, whereID3and theC4.5mainly used for classification,CARTcan solve the regression problem. ID3The algorithm can be seen in the blog "Easy to learn machine learning algorithm-decision tree ID3 algorithm",CARTThe algorithm can be seen in the blog "Easy to learn machine learning algorithm--cart regression tree." For a single-layer decision tree, this problem cannot be solved.
PythonCode
#coding: UTF-8 "Created on June 15, 2015 @author:zhaozhiyong" from numpy import *def loadsimpledata (): Datmat = Mat ([[1.,    2.1], [2., 1.1], [1.3, 1.], [1., 1.], [2., 1.]] Classlabels = Mat ([1.0, 1.0, -1.0, -1.0, 1.0]) return Datmat, Classlabelsdef singlestumpclassipy (Datamat, Dim, Threshol  D, thresholdineq): Classmat = Ones ((Shape (Datamat) [0], 1)) #根据thresholdIneq划分出不同的类, toggle if thresholdineq between '-1 ' and ' 1 '  = = ' Left ': #在threshold左侧的为 '-1 ' classmat[datamat[:, Dim] <= threshold] = -1.0 else:classmat[datamat[:,    Dim] > Threshold] = -1.0 return classmatdef singlestump (Dataarr, Classlabels, D): Datamat = Mat (Dataarr) Labelmat = Mat (classlabels). T m, n = shape (datamat) numsteps = 10.0 Beststump = {} bestclasest = zeros ((m, 1)) Minerror = inf for I In Xrange (n): #对每一个特征 #取第i列特征的最小值和最大值 To determine the step size rangemin = datamat[:, i].min () RangeMax = datamat[:, I]. Max () stepsize = (rangemax-rangemin)/numsteps for J in Xrange ( -1, int (numsteps) + 1): #不确定是哪 One belongs to class '-1 ', which belongs to class ' 1 ', in two cases for inequal in [' Left ', ' right ']: Threshold = rangemin + J * stepsize# get Threshold value for each partition Predictionclass = singlestumpclassipy (Datamat, I, Threshold, inequal) Errormat = one                 S ((M, 1)) Errormat[predictionclass = = Labelmat] = 0 Weightederror = d.t * errormat#d is the weight of each sample  If weightederror < Minerror:minerror = Weightederror bestclasest                    = Predictionclass.copy () beststump[' dim '] = I beststump[' threshold '] = threshold beststump[' inequal ' = inequal return beststump, Minerror, Bestclasestdef Adaboosttrain (Dataarr, C Lasslabels, G): Weakclassarr = [] m = shape (Dataarr) [0] #样本个数 #初始化D, i.e. the weight of each sample D = Mat (Ones ((M, 1)/m) AGGCL Asest = Mat (Zeros ((M, 1))) for I in Xrange (G): #G表示的是迭代次数 beststump, minerror, bestclasest = Singlestump (Dataarr, Classlabels, D) print ' D: ', d.t #计算分类器的权重 alpha = float (0.5 * log ((1.0-minerror)/MAX (Minerror, 1e-16))) bests tump[' alpha '] = Alpha weakclassarr.append (beststump) print ' bestclasest: ', bestclasest.t #重新计 Calculate the weight of each sample D expon = Multiply ( -1 * alpha * MAT (classlabels).         T, bestclasest) d = Multiply (d, exp (expon)) d = d/d.sum () aggclasest + = Alpha * Bestclasest print ' aggclasest: ', aggclasest aggerrors = Multiply (sign (aggclasest)! = Mat (classlabels).             T, Ones ((M, 1)) Errorrate = Aggerrors.sum ()/m print ' Total error: ', errorrate if errorrate = = 0.0: Break return Weakclassarrdef adaboostclassify (TestData, weakclassify): Datamat = Mat (testData) m = sh Ape (Datamat) [0] aggclassest = Mat (Zeros ((M, 1))) for I in Xrange (Len (weakclassify)): #weakClasSify is a list classest = Singlestumpclassipy (Datamat, weakclassify[i][' Dim '), weakclassify[i][' threshold '], weakclassif y[i][' inequal ') aggclassest + = weakclassify[i][' alpha '] * classest print aggclassest return sign (aggclas sest) If __name__ = = ' __main__ ': datmat, classlabels = Loadsimpledata () Weakclassarr = Adaboosttrain (DatM    At, Classlabels, print "Weakclassarr:", weakclassarr #test result = Adaboostclassify ([1, 1], Weakclassarr) Print result

The final decision tree sequence:Weakclassarr: [{' Threshold ': 1.3, ' Dim ': 0, ' inequal ': ' Left ', ' alpha ': 0.6931471805599453}, {' Threshold ': 1.0, ' Dim ' : 1, ' inequal ': ' Left ', ' alpha ': 0.9729550745276565}, {' Threshold ': 0.90000000000000002, ' Dim ': 0, ' inequal ': ' Left ', ' ALP Ha ': 0.8958797346140273}]





Reference1, machine learning combat 2,A short Introduction to boosting

Simple and easy to learn machine learning algorithm--adaboost

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.