Bagging is the Boostrap Aggregation, is also an integrated learning approach, before has introduced the clearance and Ensemble learning content and evaluation criteria, wherein "diversity" embodied in should be as far as possible to make the various base learners different, Bagging The practice is that, given the training set $D $, the $D $ Bootstrap sampling, to get a number of different subsets, Bootstrap will ensure that each subset has a certain intersection, training on each subset of the base classifier and combined together to make a decision, this is Bagging of thought.
Bootstrap Sampling is executed in this way, for datasets that have $m $ sample $D $, $m $ times have been put back to sample to get the DataSet $D ' $, put back the sample so that some samples $D ' $ are repeated, some samples do not appear, a simple estimate, a sample in $m $ sampling in the probability of not being picked is $ (1-\fra C{1}{m}) ^m$, take the limit:
\[\lim_{m \rightarrow \infty} (1-\frac{1}{m}) ^m = \frac{1}{e} \approx 0.368\]
That is, a sample of $D $ has a 63.2% chance of appearing in $D ' $.
Sampling a set of B Bootstrap samples $D _1, D_2, ..., d_b$, training a base learner $T _b (x) $ for this B sample set, together with these base-based learners to make decisions. In the decision-making, the voting method is usually used in the classification task, if the two categories of votes, the simplest way is to randomly select one, whereas the regression task generally uses the simple averaging method. The entire process is as follows:
In this paper, we give the learning algorithm of Bagging:
Input: Training Set $d = \left\{(x_i,y_i) \right \}_{i=1}^n $ with parameter B
1. Sample b Bootstrap Training set: $D _b, b =,..., b.$
2. $for $ $b =,... b$ $do $:
Using Bootstrap Training set $D _b$ to get the base learner $T _b (x) $;
3. The combination of B-learners gets the final model $T (x) = \sum_bt_b (x) $.
Due to the Bagging of the various basis of the learner independent, it is simply suitable for parallel, but also very fast, left for the CART after the Bagging effect of ascension, but the Bagging effect is usually not as boosting, right for the comparison of the two.
Random Forest
The random Forest is a concept built on the Bagging, and in the process of decision Tree training, the stochastic attribute selection is further introduced on the basis of building Bagging integration, in particular, assuming that the current node to be split has $d $ features, and the decision tree in Bagging At the time of splitting, an optimal feature is selected to be used as the dividing feature in all $d $ features; the random Forest for the node to be split, first randomly selects a subset of the $k $ features in the $d $ set of features, and then selects the optimal feature in the $k $ sub-set to partition the DataSet, where the parameter $k $ Control the degree of randomness, if the $k = d$, then the random Forest = Bagging, if the $k =1$ is randomly selected a property to be divided; in general, $k = Log_2 d$ is recommended. The Random Forest is such a long way.
The random Forest performed very well on many tasks, was easy to implement, had little overhead, and only made minor changes to Bagging, and the "diversity" in Bagging only came from the perturbation of the sample, and the random forest was accompanied by a characteristic disturbance, which was due to this change, Random forests have a smaller generalization error than Bagging. It also makes the base-learning device more "diverse". But the Random Forest effect is not as good as Gradient boosting, as shown in:
Bias and Variance Analysis
From the angle of bias and Variance to analyze Bagging and boosting, Bagging is the sample resampling, the sub-sample set of each resampling to train a base learner, and finally take the average. Because of the similarity of the sub-sample set and the use of the same learner, each learner has approximately equal Bias and variance (but the learner is not independent). With the Bagging combination of B models, each loss of the model is represented by a $L _b$, due to:
\[e [\bar{x}] = e[x]\]
Therefore, the Bagging of Bias and the single-base learner are not significantly reduced Bias, but if each base learner is independent, according to the $Var (\bar{x}) = \frac{1}{n}var (X) $ is:
\[var (\frac{1}{n}\sum_bl_b) = \frac{1}{n}var (l_b) \]
So Bagging can significantly reduce Variance.
Boosting from the optimization point of view, each iteration is more accurate than the last one, Adaboost from the change in the sample weights, Gradient boosting from the reduction of residual error, so boosting mainly by reducing bias to improve the accuracy of prediction.
In conclusion, Bagging reduces Bias, boosting reduces Variance .
Ensemble Learning's Bagging and Random Forest