Randomness in random forests is reflected in: 1. Randomness of training data 2. Choosing the randomness of a split property
Can solve the problem of classification and regression, and all have good estimation performance
1. Generating a data description file
Mahout describe-p input.csv-f Input.info-d2 I 3 n i 5 n i 3 C L (description file for executing describe generated data)
2. Training model
Mahout buildforest-d input.csv-ds input.info-sl 5-p-t 5-o forest_result (generate random forest model results)
3. Testing
Mahout testforest-i input.csv-ds input.info-m forest_result-a-o predictions
-A run after the command line interface can see the analysis results, including information such as accuracy rate
-D: Data path
-DS: Data Set
-SL: Select the number of training variables each child node should have, and for the regresstion problem defaults to One-third of all variables
-NC: Optional, tree not complementary
-ms: Optional, if the branch data size is less than this value (default is 2), the tree is not detached.
-MP: Optional, if the percentage of variance of the branch data is less than this value, the tree is not detached. (in the case of a regression problem, this value is used. The default
is 1/1000 (0.001).)
-SD: Optional, seed value used to initialize the random number generator.
-P: Using partial data implementation
-T: Set how many trees are in total
-O: Output path, including decision forest model
Mahout Random Forest RF algorithm