The real AdaBoost classifier is an extension and improvement of the classic AdaBoost classifier. Each weak classifier of the classic AdaBoost classifier only outputs {} or {+ 1,-1}, and the classification capability is weak, each weak classifier of real AdaBoost outputs a real value (which is also called "real"), which can be considered as a confidence level. Combined with LUT (look-up table), the ability to express complex functions is stronger than that of classical AdaBoost.
The next part is divided into three parts: the first part explains the classic Adaboost, the second part explains the real Adaboost, and the third part gives an example.
I. Typical AdaBoost
The training process of the typical AdaBoost classifier is as follows:
Although there is no restriction on the style of the weak classifier, it can be a decision tree based on multidimensional features or even SVM, but usually each weak classifier is constructed based on one dimension of all features, and the output result is only + 1,-1 (for binary classification ). Therefore, during training, each iteration selects the weak classifier corresponding to the one-dimensional feature with the best classification performance under the current training set distribution.
During prediction, input a sample. The classic AdaBoost performs weighted addition of the {-1, + 1} values output by all weak classifiers as the final result. To get different accuracy and recall rates, you can set different threshold values. For example, if the output is 0.334, if threshold is set to 0, the classification result is + 1. If Threshold = 0.5, the classification result is-1.
Ii. Real AdaBoost
The training process of real AdaBoost classifier is as follows:
Iii. Examples
The real AdaBoost training introduced in the second part seems confusing for most people. For example, I would like to explain it. Real AdaBoost is used in the classic article fast rotation invariant multi-view Face Detection Based on Real AdaBoost. First, the thesis extracts many Haar features from the sliding window. If you do not know what the Haar features are, refer to my blog Viola Jones face detector. Then, for each Haar feature, normalize it to [0, 1], and then divide it by 64. That is to say, divide 0-1 into 64 parts. This is a number of mutually exclusive sub-spaces. The following computation is the same as that in the second. In the 64 sub-spaces, the weighted sum of positive and negative samples W (+ 1) and W (-1) are calculated ), use these two values to calculate the output of the weak classifier and the normalization factor Z. Finally, the weak classifier of the smallest Haar feature is selected as the weak classifier selected by this iteration. This weak classifier actually has 64 corresponding real output values for 64 subspaces. During prediction, if we save 64 values to an array, we can use the lookup table to calculate the classifier output corresponding to any input feature. Assume that the input Haar feature is 0.376 (normalized), 0.376/(1/64) = 24.064, then this value falls into the 24th subspaces, that is, the value of 24th elements in the array. That is, the output value of the current weak classifier. Finally, we sum the output of all the weak classifiers and set the threshold value B to get the final output result of the strong classifier. That's simple.