Reference:
http://blog.csdn.net/nieson2012/article/details/51279332
Http://www.cnblogs.com/wentingtu/archive/2011/12/22/2297405.html
Http://www.cnblogs.com/pinard/p/6156009.html
Algorithm Description:
1. Load data (training data and test data), assuming the total number of training sets is n.
2, to eliminate some of the data set features (useless features).
3, the forecast label from the data set temporarily removed.
4, set each selected feature number, such as each sample has m characteristics, each time we use only 2 features.
4, cycle to create each tree:
Randomly select 2 features from the M feature and add the tags back in
A new subset of the data is created from the training set with the extracted n samples, and the N sample contains only 3 features.
Create a tree with a subset of the data you create:
To slice a subset of data:
First, the initial Gini coefficient of the data subset is obtained.
For the 2 selected features:
For each of the values in the feature:
Splits the dataset based on the feature value.
The Gini coefficients of the data sets are computed based on the feature-value segmentation.
The Gini coefficients are reduced by the initial Gini coefficient minus the Gini coefficient obtained by dividing the dataset.
Record the maximum reduction, and obtain the corresponding segmentation features and eigenvalues
The reduced quantity and the characteristic value of the Gini are obtained, and if the reduction accords with the threshold value, the subset of the data is divided according to the characteristic and the characteristic value.
Recursive invocation of the result of data subset segmentation (creating a tree with a subset of the data created) this step.
Recursive completion completes the creation of a tree.
Save every tree and make up the forest.