Detailed parameters in the Random forest algorithm

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is not about RF, there are a lot of easy-to-understand RF online explanations
There are plenty of explanations in many textbooks, such as watermelon books and statistical learning methods.
This article is only for recording how to use the Randomforestclassifier in Sklearn
first, how to write code [Python]View Plain copy print? Class Sklearn.ensemble.RandomForestClassifier (n_estimators=10, crite-rion= ' Gini ', Max_depth=none, Min_samples_ split=2, Min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features= ' auto ', Max_leaf_nodes=none, Bootstrap =true, Oob_score=false, N_jobs=1, Ran-dom_state=none, verbose=0, Warm_start=false, Class_weight=none)
second, about the parameters among the parameters about the decision tree:

criterion: "Gini" or "Entropy" (default= "Gini") is the Gini of the computed attribute (Gini) or entropy (information gain) to select the most appropriate node.

Splitter: The "best" or "random" (default=) randomly selects the attribute or selects the property with the highest purity, which is recommended by default.

max_features: The feature that is divided when selecting the optimal attribute cannot exceed this value.

When an integer, the maximum number of features, or the number of features of the training set when it is a decimal;

If "Auto", then Max_features=sqrt (n_features).

If "sqrt", Thenmax_features=sqrt (N_features).

If "Log2", Thenmax_features=log2 (N_features).

If None, then max_features=n_features.

max_depth: (default=none) sets the maximum depth of the tree, the default is None, so that when making a contribution, each leaf node will have only one category, or reach Min_samples_split. min_samples_split: The minimum number of samples to divide by attribute when dividing nodes.

min_samples_leaf: The minimum number of samples for a leaf node.

max_leaf_nodes: The maximum number of samples (default=none) of the leaf tree.

min_weight_fraction_leaf: (default=0) minimum weight required for leaf nodes verbose:(default=0) Whether the task process is displayed
about random forest-specific parameters: n_estimators=10: The number of decision trees, the more the better, but the performance will be worse, at least about 100 (where the specific number forget where) can achieve acceptable performance and error rate. Bootstrap=true: Whether there is a sample put back.

oob_score=false: Oob (out of the band, out-of-band) data, i.e., data that is not bootstrap selected in a decision tree training. Parameter training for multiple individual models, we know that cross validation (CV) can be used, but it is especially time consuming, and it is not necessary to use this data to validate the decision tree model. Performance is less expensive, but it works well.

N_jobs=1: Number of parallel jobs. This is very important in the ensemble algorithm, especially bagging (not boosting, because there is an impact between each iteration of boosting, so it is difficult to parallelize) because it can be parallelized to improve performance. 1= does not parallel, N- -1:cpu is parallel, and how many jobs are initiated by the number of cores

warm_start=false: Hot start, decide whether to use the result of the last call to the class and then add a new.

Class_weight=none: The weights of each label.
There are several ways to make predictions:

Predict_proba (x): gives the result with a probability value. Each point has a probability of all labels and is 1.

predict (x): gives the predicted result directly. Internal or called Predict_proba (), depending on the outcome of the probability, which type of prediction is the highest.

Predict_log_proba (x): basically the same as Predict_proba, just make the result log () processing.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Detailed parameters in the Random forest algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Detailed parameters in the Random forest algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support