First, use an image example to illustrate the AdaBoost process:
1. Each time a weak classifier is generated, the wrong sample weight is added to the next round
2. Next round to the previous round of the sample of the wrong to increase learning, to obtain another weak classifier
After the T-wheel, learn t a weak classifier, and then combine the T-weak classifier together, forming a strong classifier.
Because the weights of each round of the sample are changing, the objective function of the classifier learning also has changed:
Both SVM and logistic regression can be learned in this way, adding different weights to each sample.
The question then becomes, how to adjust the weight of the sample? What is the purpose?
Lam introduced a principle:
Objective: If you want to combine multiple classifiers, the difference between multiple classifiers will be larger.
Methods: The last round of the sample was reduced in the next round, and the last round of split-up samples increased in the next round, so that the GT and gt+1 could classify different samples differently.
Lin went on to introduce a re-weighting method with practical operation feasibility.
Multiply the sample weights by the error rate and multiply the sample weights by (1-Error rate): The previous PPT mentions that the result of this is that ut+1 is random for the GT classifier, but this round of learning results for ut+1 gt+ The 1 classifier (if it is really learning) must be better than the random, and thus, not only to ensure the difference, but not too much adjustment .
Each round of the classifier how to learn to solve, but the classifier how the combination has not been mentioned.
Lin describes a method for linear Aggregation on the fly:
In this way, after each round of the classifier, the weights in front of the classifier are:
Take an ln as the classifier weight for scaling factor
1) The coefficient is positive, indicating that the classifier can play a certain role in the correct classification
2) The coefficient is 0, which means that the classifier follows the same effect as the machine
3) The coefficient is negative, indicating that the classifier judgment result is more likely to be reversed than the actual result
If it is an engineering program, consider here if the error rate=0 case, do a special deal.
In the end, Lin theoretically discussed the basis of AdaBoost:
Why does this approach work?
1) The Ein may be getting smaller with each step of the way
2) enough sample size, VC bound can ensure that Ein and eout close (good generalization)
Lin then introduces a classic example of a adaboost:
To find a weak classifier, that is no weaker than the one-dimension stump, but it is so weak classifier, through the combination also produced great work.
Work is real-time face recognition.
"Adaptive Boosting" heights Field machine learning techniques