Opencv random forest Parameters
[Original source]: http://blog.csdn.net/sangni007/article/details/7488727
Thank you for your translation.
In opencv2.3Inheritance Structure:
API:
Cvrtparams |
Defines the extension subclass of the parameter cvdtreeparams for R. T. Training, but does not use all the parameters required by cvdtreeparams (single decision tree. For example, R. T. Usually does not require pruning, so pruning parameters are not used. Max_depthMaximum depth a single tree can reach Min_sample_countMinimum number of samples for continuous split of a tree node. That is to say, a node smaller than this number will not continue to split and become a leaf. Regression_accuracyThe termination condition of the regression tree. If the precision of all nodes reaches the requirement, it is stopped. Use_surrogatesWhether to use proxy for splitting. It is usually false, which is true when there is a defect in data or when calculating the importance of the variable. For example, the variable is color, and some areas in the image are completely black because the light Max_categoriesClustering all possible values to a finite class to ensure the computing speed. The tree will grow in the form of suboptimal split. Only valid for trees with two or more values PriorsSet the priority to set some classes or values that you are particularly concerned about, so that the training process pays more attention to their classification or regression accuracy. Usually not set Calc_var_importanceSet whether to obtain the important value of the variable. Generally, set it to true. Nactive_varsEach node in the tree randomly selects the number of variables and finds the best split based on these variables. If the value is set to 0, the square root of the sum of variables is automatically obtained. Max_num_of_trees_in_the_forestThe maximum number of trees that may exist in R. T. Forest_accuracyAccuracy (as the condition for termination) Termcrit_typeTermination condition settings --Cv_termcrit_iterTake the number of trees as the termination condition, max_num_of_trees_in_the_forest takes effect --Cv_termcrit_epsWith the accuracy as the condition for termination, forest_accuracy takes effect --Cv_termcrit_iter | cv_termcrit_epsBoth are termination conditions |
Cvrtrees: Train |
Training R. T. Return boolTraining successful? Train_dataTraining data: Sample (a sample is defined by a fixed number of variables) stored in mat format and arranged in columns or rows. It must be in the cv_32fc1 format. TflagTraindata arrangement structure --Cv_row_sampleRow Arrangement --Cv_col_sampleColumn Arrangement ResponsesTraining data: The sample value (output) is stored as a one-dimensional mat. It corresponds to traindata and must be in the cv_32fc1 or cv_32sc1 format. For classification problems, responses are class labels; for Regression Problems, responses are the function values to be approached Var_idxDefines the variables of interest. Some of the variables indicate that null represents all Sample_idxDefines samples of interest. Some of the samples are null, indicating all Var_typeDefine the responses type --Cv_var_categoricalCATEGORY tag --Cv_var_ordered(Cv_var_numerical) Value for Regression Missing_maskDefines the missing data, which is as big as the eight-bit mat of train_data. Params Training parameters defined by cvrtparams |
Cvrtrees: Train |
Train the R. T. (short version of the train function) Return boolTraining successful? DataTraining data: cvmldataformat, which can be read from external .csv files and stored in mat format internally. It is also similar to value/responses/missing mask. Params Training parameters defined by cvrtparams |
Cvrtrees: predict |
Prediction (classification or regression) of a group of Input Samples) Return doublePrediction Result SampleInput sample, in the same format as train_data of cvrtrees: Train Missing_maskDefine missing data |
Example:
- # Include <cv. h>
- # Include <stdio. h>
- # Include
- # Include <ml. h>
- # Include <map>
-
- Void print_result (floattrain_err, floattest_err,
- Constcvmat * _ var_imp)
- {
- Printf ("Train Error % F \ n", train_err );
- Printf ("test error % F \ n", test_err );
-
- If (_ var_imp)
- {
- CV: matvar_imp (_ var_imp), sorted_idx;
- CV: sortidx (var_imp, sorted_idx, cv_sort_every_row +
- Cv_sort_descending );
-
- Printf ("variable importance: \ n ");
- Int I, n = (INT) var_imp.total ();
- Int type = var_imp.type ();
- Cv_assert (type = cv_32f | type = cv_64f );
-
- For (I = 0; I <n; I ++)
- {
- Intk = sorted_idx.at <int> (I );
- Printf ("% d \ t % F \ n", K, type = cv_32f?
- Var_imp.at <float> (k ):
- Var_imp.at <double> (k ));
- }
- }
- Printf ("\ n ");
- }
-
- Int main ()
- {
- Const char * filename = "data. xml ";
- Int response_idx = 0;
-
- Cvmldata data;
- Data. read_csv (filename); // read data
- Data. set_response_idx (response_idx); // set response index
- Data. change_var_type (response_idx,
- Cv_var_categorical); // set response type
- // Split train and test data
- Cvtraintestsplitspl (0.5f );
- Data. set_train_test_split (& Spl );
- Data. set_miss_ch ("? "); // Set Missing Value
-
- Cvrtrees rtrees;
- Rtrees. Train (& Data, cvrtparams (10, 2, 0, false,
- 16, 0, true, 0,100, 0, cv_termcrit_iter ));
- Print_result (rtrees. calc_error (& Data, cv_train_error ),
- Rtrees. calc_error (& Data, cv_test_error ),
- Rtrees. get_var_importance ());
-
- Return 0;
- }
References:
[1] opencv 2.3 online documentation: http://opencv.itseez.com/modules/ml/doc/random_trees.html
[2] Random forests, Leo breiman and Adele Cutler: http://www.stat.berkeley.edu/users/breiman/RandomForests/cc_home.htm
[3] T. Hastie, R. tibshirani, J. H. Friedman.The
Elements of Statistical Learning.ISBN-13 978-0387952840,200 3, Springer.