In simple terms, a random forest is a combination of bagging+ decision trees (usually using a cart tree). That is, a forest of independent decision trees, because each tree is independent of each other, so in the final model combination, each tree weights equal, that is, by voting to determine the final classification results.
Stochastic forest algorithm main process:
1, the selection of sample sets.
Assuming that the original sample set has a total of n samples, each wheel extracts n samples from the original sample set through Bootstraping (with a back-up sampling), and a training set of size n is obtained. During the extraction of the original sample set, there may be samples that have been repeatedly extracted, or samples that have not been drawn at one time.
For a total of K-wheel extraction, the training set for each round is t1,t2,..., Tk.
2. Generation of decision Trees
If the feature space has a characteristic of D, the D feature (D<D) is randomly selected from the D feature to form a new feature set, and a decision tree is generated by using a new special solicitation in the process of generating decision trees for each round.
In the K-wheel generation K decision Tree, because this K decision tree in the training Set selection and feature selection is random, because the K decision trees are independent of each other.
3. Combination of models
Since the resulting k decision trees are independent of each other, the importance of each decision tree is equal, so when you combine them, you do not have to consider their weights, or you can think that they have the same weight value. For classification problems, the final classification results use all decision tree polls to determine the final classification result, and for regression problems, the output mean is used for all decisions as the final output.
4. Validation of the model
The validation of the model requires a validation set, where we do not need to get a special additional validation set, only the samples that have not been used are selected from the original sample set.
When you select a training set from the original sample, there are some samples that have not been selected once, and in the case of feature selection, there may be cases where some features are not used, we simply need to validate the final model with these unused data.
5. Summary
There are several main aspects of the features:
1) There are two random extraction processes: Random extraction of the training set from the original sample set, and the selection of the decision tree to randomly extract some feature generation decision tree.
2) The mutual independence between the decision trees. Because of the independence of each other, the generation process of decision tree can be carried out in parallel, which greatly improves the time efficiency of the algorithm.
6. On-line random forest (online forests)
Unlike ordinary random forests, where trees are different, each tree in the online random forest is an online random decision tree.
Because the research is not clear enough to write first!
[1] Hangyuan Li, statistical learning methods.
[2] bigbigboat, http://www.cnblogs.com/liqizhou/archive/2012/05/10/2494558.html
[3] Saffari etc. On-line Random forests, ICCV.
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5457447
Process and analysis of stochastic forest algorithm