"Machine learning Combat" study notes: Using AdaBoost meta-algorithm to improve classification performance

Source: Internet
Author: User
Tags svm

I. About the origins of the boosting algorithm

The boost algorithm family originates from PAC learnability (literal translation called Pac-Learning). This set of theories focuses on when a problem can be learned.

We know that computable is already defined in computational theory, and that learning is what the PAC learnability theory defines. In addition, a large part of the computational theory is devoted to the study of the problem is computable, and its complexity is what. Therefore, in the theory of computational learning, there is also the content of the complexity of studying the problem of learning, mainly the complexity of samples (sample complexity).

Finally, in the computable time, the concrete algorithm to achieve the calculation is also an important part of the computational theory, and more in the field of "machine learning", we tend to explore specific learning algorithms for learning problems.

It sounds complicated, in short, thatthe role of the PAC model is equivalent to providing a rigorous set of formal languages to state and characterize the learning learnability and (sample) complexity complexity problems mentioned here.

The theory was presented by Valiant, who won the 2010 Turing Award.

The PAC defines the strength of the learning algorithm:

Weak learning algorithm: The recognition error rate is less than 1/2 (that is, the accuracy rate is only slightly higher than the random guessing learning algorithm);
Strong Learning algorithm: A learning algorithm that recognizes high accuracy and can be completed in polynomial time.

A more rigorous definition:

Weak learning algorithm: a concept if there is a polynomial learning algorithm can learn it, and the correct rate of learning is only slightly better than random guessing (above 50%), then, the concept is weak to learn;

Strong Learning algorithm: a concept if there is a polynomial learning algorithm can learn it, and the correct rate is very high, then, the concept is strong can learn.

At the same time, Valiant and Kearns also put forward the equivalence of weak learning algorithm and strong learning algorithm in PAC learning Model, that is , whether any given weak learning algorithm is slightly better than random guessing, can it be promoted to strong learning algorithm? if the weak learning algorithm and the strong learning algorithm are both equivalent, Then it is necessary to find a weak learning algorithm that is slightly better than random guessing, so that it can be promoted to a strong learning algorithm without looking for a strong learning algorithm that is difficult to obtain. It is this speculation that allows countless researchers to design algorithms to verify the correctness of PAC theory.

But for a long time there is no practical way to achieve this ideal. Details determine success or failure, and a good theory requires an effective algorithm to execute. Finally Kung Fu, Schapire in 1996 to propose an effective algorithm to truly realize this long-cherished wish, its name is AdaBoost. AdaBoost combines several different decision trees in a non-random way, showing amazing performance and benefits:

    1. The accuracy rate of decision tree is greatly improved, and can be comparable with SVM.
    2. When using a simple classifier, the computed results are understandable. And the weak classifier is extremely simple to construct.
    3. The speed is fast, and the basic parameter is not adjusted;
    4. Simple, no need to do feature screening;
    5. Don't worry about overfitting.

"I guess Breiman and Friedman were certainly happy, because seeing their cart is being compared to SVM, adaboost the decision tree back to the dead!" Breiman can not help but in his thesis praised AdaBoost is the best spot method (off-the-shelf, that is, "take it Off" means). (This passage is excerpted from the statistical study of those things)

Two. Classifiers based on multiple sampling of datasets

We have tried to design multiple classifiers before, and if we try to combine different classifiers, the result of this combination is the integration method (or meta-algorithm). There are several forms of integration, which can be integration of different algorithms, integration of the same algorithm under different settings, and integration of different parts of the dataset into different classifiers. Common methods of integration are bagging methods and boosting methods.

1.bagging: Classifier construction method based on random resampling of data

The bagging method, also known as the Bootstrap aggregation method (bootstrap aggregating), is a technique for re-selecting from the original dataset (which can be put back, repeatable) to get the s new data set, which is equal to the original dataset size, Each dataset is replaced by randomly selecting a sample in the original data set.

After setting up a new data set of S, a learning algorithm is used for each data set, and the S classifier is obtained. This s classifier is superimposed, their weights are equal (of course, the S classifier uses a different classification algorithm, you can consider the use of voting), so you can get a strong learning device.

The following are the main steps for the bagging method:

    1. Repeatedly sampling n samples from a sample set D
    2. For each sample set of sub-samples, statistical learning, to obtain the hypothesis hi
    3. Combine several assumptions to form a final hypothesis hfinal
    4. Use the final assumptions for a specific classification task

2.boosting method

The following explanation is given in the book: Boosting is a technique similar to bagging. The types of multiple classifiers used are consistent, whether in boosting or bagging methods. The difference is that the bagging method obtains different classifiers through serial training, and each new classifier is trained based on the performance of the classifier being trained; theboosting is acquiring a new classifier by focusing on the data that has been wrongly divided by the existing classifier.

The classification result of the boosting method is based on the weighted summation result of all classifiers, the weights of each classification result are not equal, each weight represents the success degree of its corresponding classifier in the previous iteration.

The boosting method has multiple versions, and AdaBoost is one of them.

Three. Adaboost: Performance based on error boost classifier

AdaBoost is an iterative algorithm, the core idea is to train different classifiers for the same training set, that is, weak classifiers, and then assemble these weak classifiers to construct a stronger final classifier, which is much lower than the weak classifier, the error rate of the "strong" classifier.

The adaboost algorithm itself is a change in the distribution of data, based on the correct classification of each sample in each training set, and the accuracy of the last population classification, to determine the weights of each sample. The new data that modifies weights is given to the lower classifier for training, and then the classifier that is trained each time is combined as the final decision classifier. The following is a procedure for running the AdaBoost algorithm:

    1. Train each sample in the data and give it a weight that makes up vector D, at the beginning of which the weight d is initialized to equal values;
    2. First a weak classifier is trained on the training sample and the error rate of the classifier is calculated.
    3. The weak classifier is trained again on the same data set, and the weights of each sample are re-adjusted in the two training sessions of the classifier, in which the weights of the samples with the correct first classification will be reduced, and the sample weights of the classification errors will be increased;
    4. To get the final classification result from all weak classifiers, AdaBoost assigns a weight value alpha to each classifier, which is calculated based on the error rate of each weak classifier.

Where the error rate is defined by the following formula:

And Alpha is calculated as follows:

The flowchart of the AdaBoost algorithm is as follows:

The left side of the figure represents the dataset, where the different widths of the histogram represent different weights on each sample. After a classifier, the weighted predictions are weighted by the alpha value in the triangle, and the weighted results from each triangle are summed in the circle, resulting in the resulting output.

After calculating the alpha value, the weight vector D can be updated to reduce the weight of the correctly categorized samples and increase the weight of the wrong sample. The calculation method is as follows:

For a properly categorized sample, its weight is changed to:

For a sample of the error category, its weight is changed to:

After calculating the weight vector D, the AdaBoost method begins to enter the next iteration. The AdaBoost method constantly repeats the process of training and adjusting weights, knowing that the training error rate is 0 (or the user-specified condition is reached).

a blog gives an easy-to-understand explanation : AdaBoost's training process is like a student's learning process: we see each training sample as an exercise, and all the training samples as a problem set. The first time to do the problem, because each question has not been done, do not know which is difficult which is simple, so the same, the comparison of the answer, there may be many problems do wrong, then for the wrong topic, we focus on marking, to give a higher degree of attention, this with the weight of W to mark, To the second round of the problem when the focus on these did not do the right "problem", for the first time to do the right topic, you can think is relatively simple, then the second time a little look at it, you can reduce his weight. And, for the first round after the effect of the overall score to evaluate the round of the ability to do the problem, this is alpha. In the second round of the problem, according to the previous round adjusted weights to different topics to give different attention to the degree and time distribution. So constantly practicing, a few rounds down, the problem is gradually conquered. Every round is a harvest, but the knowledge weight of each harvest is different (alpha), so that we finally get M classifier, combined with the weight of each classifier, we can get a "good academic performance" classifier.

Four. Building a classifier based on a single-layer decision tree

Implemented using Python:

 def loadsimdat():Datamat = Matrix ([[1,2.1],                      [2.0,1.1],                      [1.3,1.0],                      [1.0,1.0],                      [2.0,1.0]]) Classlabels = [1.0,1.0, -1.0, -1.0,1.0]returnDatamat, Classlabels# Single-layer decision tree generation function def stumpclassify(datamatrix,dimen,threshval,threshineq):#just classify the dataRetarray = Ones ((Shape (Datamatrix) [0],1))ifThreshineq = =' LT ': Retarray[datamatrix[:,dimen] <= threshval] =-1.0    Else: Retarray[datamatrix[:,dimen] > Threshval] =-1.0    returnRetarray def buildstump(Dataarr, Classlabels, D):Datamatrix = Mat (Dataarr); Labelmat = Mat (classlabels). T m,n = shape (datamatrix) Numsteps =10.0; Beststump = {}; Bestclassest = Mat (Zeros (M,1))) Minerror = inf forIinchRange (n): RangeMin = Datamatrix[:,i].min ();        RangeMax = Datamatrix[:,i].max (); Stepsize = (rangemax-rangemin)/numsteps forJinchRange (-1, int (numsteps) +1): forInequalinch[' LT ',' GT ']: Threshval = (rangemin + float (j) *stepsize) Predictedvals = stumpclassify (Datamatrix, I, t Hreshval, inequal) Errarr = Mat (ones (M,1))) Errarr[predictedvals = = Labelmat] =0Weightederror = d.t * Errarr#这里的error是错误向量errArr和权重向量D的相应元素相乘得到的即加权错误率                #print "Split:dim%d, Thresh%.2f, Thresh inequal:%s, the weighted error is%.3f"% (I, Threshval, inequal, Weightede Rror)                ifWeightederror < Minerror:minerror = Weightederror Bestclassest = predictedvals.c Opy () beststump[' Dim '] = i beststump[' Thresh '] = Threshval beststump[' Ineq '] = InequalreturnBeststump, Minerror, bestclassest

The next section applies the adaboost algorithm to deal with unbalanced classification problems.

Reference Links:
http://blog.csdn.net/lu597203933/article/details/38666303
http://blog.csdn.net/lskyne/article/details/8425507

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

"Machine learning Combat" study notes: Using AdaBoost meta-algorithm to improve classification performance

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.