Detailed parameters in the Random forest algorithm

Source: Internet
Author: User
This article is not about RF, there are a lot of easy-to-understand RF online explanations
There are plenty of explanations in many textbooks, such as watermelon books and statistical learning methods.
This article is only for recording how to use the Randomforestclassifier in Sklearn
first, how to write code [Python]View Plain copy print? Class Sklearn.ensemble.RandomForestClassifier (n_estimators=10, crite-rion= ' Gini ', Max_depth=none, Min_samples_ split=2, Min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features= ' auto ', Max_leaf_nodes=none, Bootstrap =true, Oob_score=false, N_jobs=1, Ran-dom_state=none, verbose=0, Warm_start=false, Class_weight=none)
second, about the parameters among the parameters about the decision tree:

criterion: "Gini" or "Entropy" (default= "Gini") is the Gini of the computed attribute (Gini) or entropy (information gain) to select the most appropriate node.

Splitter: The "best" or "random" (default=) randomly selects the attribute or selects the property with the highest purity, which is recommended by default.

max_features: The feature that is divided when selecting the optimal attribute cannot exceed this value.

When an integer, the maximum number of features, or the number of features of the training set when it is a decimal;

If "Auto", then Max_features=sqrt (n_features).

If "sqrt", Thenmax_features=sqrt (N_features).

If "Log2", Thenmax_features=log2 (N_features).

If None, then max_features=n_features.

max_depth: (default=none) sets the maximum depth of the tree, the default is None, so that when making a contribution, each leaf node will have only one category, or reach Min_samples_split. min_samples_split: The minimum number of samples to divide by attribute when dividing nodes.

min_samples_leaf: The minimum number of samples for a leaf node.

max_leaf_nodes: The maximum number of samples (default=none) of the leaf tree.

min_weight_fraction_leaf: (default=0) minimum weight required for leaf nodes verbose:(default=0) Whether the task process is displayed
about random forest-specific parameters: n_estimators=10: The number of decision trees, the more the better, but the performance will be worse, at least about 100 (where the specific number forget where) can achieve acceptable performance and error rate.   Bootstrap=true: Whether there is a sample put back.

oob_score=false: Oob (out of the band, out-of-band) data, i.e., data that is not bootstrap selected in a decision tree training. Parameter training for multiple individual models, we know that cross validation (CV) can be used, but it is especially time consuming, and it is not necessary to use this data to validate the decision tree model. Performance is less expensive, but it works well.

N_jobs=1: Number of parallel jobs. This is very important in the ensemble algorithm, especially bagging (not boosting, because there is an impact between each iteration of boosting, so it is difficult to parallelize) because it can be parallelized to improve performance. 1= does not parallel, N- -1:cpu is parallel, and how many jobs are initiated by the number of cores

warm_start=false: Hot start, decide whether to use the result of the last call to the class and then add a new.

Class_weight=none: The weights of each label.
There are several ways to make predictions:

Predict_proba (x): gives the result with a probability value. Each point has a probability of all labels and is 1.

predict (x): gives the predicted result directly. Internal or called Predict_proba (), depending on the outcome of the probability, which type of prediction is the highest.

Predict_log_proba (x): basically the same as Predict_proba, just make the result log () processing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.