AdaBoost Algorithm
The basic idea is that it is difficult to judge a complex problem by using a classification algorithm. Then we use a group of classifiers for comprehensive judgment to obtain the result, "Three stinks are at the top of a Zhuge Liang"
Professional statement,
Stronugly learnable, which has a polynomial algorithm that can be learned and has a high accuracy.
Weakly learnable (weakly learnable), there is a polynomial algorithm that can learn, but the accuracy is slightly higher than random prediction
In addition, it can be proved that strong learner and weak learner are equivalent.
It is easy to find a weak learner algorithm. If I use a weak learner algorithm to boosting a strong learner algorithm?
AdaBoost is an algorithm like this. Through repeated learning, a group of weak classifiers are obtained, and a strong classifier is obtained by combining these weak classifiers.
The problem is if we get a group of weak classifiers?
Of course, you can use different classification algorithms for training.
You can also use different training sets, such as Bagging, to perform M random sampling on the training set to obtain m new training sets.
AdaBoost uses the same algorithm and training set, but changes the weight of each training sample. Because the target function for solving the classifier is, the weighted error is the least, therefore, different weights will obtain different classifier parameters.
The specific rule is that after each round of classification, the weight of the sample with the error score is increased, and the weight of the sample with the right score is reduced. The sum of the weights of all samples is 1.
In this way, the next round of classifier solution will focus more on the sample points such as the previous round of error, to achieve the goal of divide and conquer.
It should be noted that this algorithm is sensitive to the off-group value and is easy to overfitting.
Each weak classifier also has a weight, which represents the error rate of the classifier. The final result is obtained by weighted majority voting.
Specific algorithms,
For training sets
1. initialize the weight of the training sample, which is evenly distributed. The probability of each sample is the same.
2. Obtain m weak classifiers through iterative learning. For the m weak classifier,
2.1 For the training set, the classifier, GM
2.2 calculates the weighted error of the weak classifier.
2.3 calculate the weight of the weak classifier. The smaller the visible error of the log function, the higher the weight, that is, the greater the role of the final strong classifier.
2.4 Key Step: update the weight of the training sample
The first formula is actually,
Exponential Distribution, less than 0, value in (0, 1), greater than 0, value greater than 1
Therefore, when GM (x) = Y, the correct sample is determined to reduce the weight.
Incorrect sample judgment, adding weight
The reason for dividing ZM is that the sum of ownership values must be 1 and ZM is used for standardization.
3. We have obtained m weak classifiers. How can we combine them,
Simple, weighted majority voting
The Sign Function, value:-1 (x <0), 0, 1 (x> 0)
Statistical Learning Method note-boosting method