AdaBoost is an iterative algorithm. Its core idea is to train different classifiers (weak classifiers) for the same training set, and then combine these weak classifiers to form a stronger final classifier.
(Strong classifier ). The algorithm itself is implemented by changing the data distribution. It determines the weight of each sample based on whether the classification of each sample in each training set is correct and the accuracy of the previous overall classification.
Value. The new dataset with modified weights is sent to the lower-level classifier for training. The classifier obtained after each training is finally combined as the final decision classifier. Using Adaboost Classifier
In addition to unnecessary training data features, the key is placed on key training data.
At present, the research and application of AdaBoost algorithms are mostly focused on classification issues. At the same time, some applications have been developed in recent years on regression issues. The AdaBoost series is mainly used to solve the following problems: multi-class single-label, multi-class multi-label, multi-class single-label, and regression. It uses all training samples for learning.
This algorithm is actually a simple process of improving the weak classification algorithm. through continuous training, we can improve the data classification capability. The process is as follows:
1. Obtain the first weak classifier by learning n training samples;
2. the faulty sample and other new data are combined to form a new N training samples. The second weak classifier is obtained through learning this sample;
3. Add other new samples to form a new N training samples. The third weak classifier is obtained through learning the samples;
4. Improved strong classifier. Which type of data is passed ,...... .
2.3 Adaboost (Adaptive boosting) Algorithm
There are two problems with the boosting algorithm:
1. How to adjust the training set to enable the weak classifier trained on the training set;
2. How to combine the trained weak classifiers to form a strong classifier.
To address the above two problems, the Adaboost algorithm is adjusted:
1. Use the training data selected after weighting to replace the randomly selected training sample, so that the training focus is concentrated on the training data samples that are difficult to distinguish;
2. Associate weak classifiers and use the weighted voting mechanism instead of the average voting mechanism. A weak classifier with good classification performance has a greater weight, while a classifier with poor classification performance has a smaller weight.
AdaBoost algorithms are proposed by Freund and schapire based on online allocation algorithms.
The upper limit of the Adaboost algorithm error rate and
And the maximum number of iterations required by the algorithm. Unlike the boosting algorithm, the Adaboost algorithm does not need to know the lower limit of the learning accuracy rate of the weak Learning Algorithm in advance, that is, the error of the weak classifier.
Poor, and the final classification accuracy of the strong classifier depends on the classification accuracy of all weak classifiers, so that we can deeply explore the capabilities of the weak classifier algorithm.
In the Adaboost algorithm, different training sets are implemented by adjusting the weights of each sample. At the beginning, each sample
The corresponding weights are the same, where N is the number of samples, and a weak classifier is trained under this sample distribution. For
Samples with incorrect classifications increase their corresponding weights. for samples with correct classifications, the weights are reduced so that the samples with incorrect classifications are highlighted to obtain a new sample distribution.
. In the new sample distribution, the weak classifier is trained again to obtain the weak classifier. And so on. After T cycles, we get t weak classifiers.
A weak classifier is superimposed by a certain weight (boost) to obtain the desired strong classifier.
The specific steps of the Adaboost algorithm are as follows:
1. Given the training sample set, which corresponds to the positive sample and the negative sample respectively; it is the maximum number of cycles for training;
2. initialize the sample weight, that is, the initial probability distribution of the training sample;
3. first iteration:
(1) Training a weak Classifier in the probability distribution of training samples:
(2) Calculate the error rate of a weak classifier:
(3) select to minimize
(4) Update the sample weight:
(5) Final strong classifier:
AdaBoost is an adjusted boosting algorithm, which can be used to obtain weak classifier errors from weak learning.
Make adaptive adjustments. The main loop that is iterated in the above algorithm, each cycle is based on the current Weight Distribution
Determine a distribution P for sample X, and then use the learning algorithm to obtain a weak classifier with an error rate for the samples under this distribution. For the weak learning algorithm defined by this algorithm, for all, all have
The upper limit of the error rate does not need to be known in advance. In fact
. Weights must be updated for each iteration. The updated rule is to reduce the probability of data with better weak classifier classification performance and increase the probability of data with poor weak classifier classification performance. The final classifier is
Weighted average of weak classifiers.