Process and analysis of stochastic forest algorithm

Last Update:2015-07-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In simple terms, a random forest is a combination of bagging+ decision trees (usually using a cart tree). That is, a forest of independent decision trees, because each tree is independent of each other, so in the final model combination, each tree weights equal, that is, by voting to determine the final classification results.

Stochastic forest algorithm main process:

1, the selection of sample sets.

Assuming that the original sample set has a total of n samples, each wheel extracts n samples from the original sample set through Bootstraping (with a back-up sampling), and a training set of size n is obtained. During the extraction of the original sample set, there may be samples that have been repeatedly extracted, or samples that have not been drawn at one time.

For a total of K-wheel extraction, the training set for each round is t1,t2,..., Tk.

2. Generation of decision Trees

If the feature space has a characteristic of D, the D feature (D<D) is randomly selected from the D feature to form a new feature set, and a decision tree is generated by using a new special solicitation in the process of generating decision trees for each round.

In the K-wheel generation K decision Tree, because this K decision tree in the training Set selection and feature selection is random, because the K decision trees are independent of each other.

3. Combination of models

Since the resulting k decision trees are independent of each other, the importance of each decision tree is equal, so when you combine them, you do not have to consider their weights, or you can think that they have the same weight value. For classification problems, the final classification results use all decision tree polls to determine the final classification result, and for regression problems, the output mean is used for all decisions as the final output.

4. Validation of the model

The validation of the model requires a validation set, where we do not need to get a special additional validation set, only the samples that have not been used are selected from the original sample set.

When you select a training set from the original sample, there are some samples that have not been selected once, and in the case of feature selection, there may be cases where some features are not used, we simply need to validate the final model with these unused data.

5. Summary

There are several main aspects of the features:

1) There are two random extraction processes: Random extraction of the training set from the original sample set, and the selection of the decision tree to randomly extract some feature generation decision tree.

2) The mutual independence between the decision trees. Because of the independence of each other, the generation process of decision tree can be carried out in parallel, which greatly improves the time efficiency of the algorithm.

6. On-line random forest (online forests)

Unlike ordinary random forests, where trees are different, each tree in the online random forest is an online random decision tree.

Because the research is not clear enough to write first!

[1] Hangyuan Li, statistical learning methods.

[2] bigbigboat, http://www.cnblogs.com/liqizhou/archive/2012/05/10/2494558.html

[3] Saffari etc. On-line Random forests, ICCV.

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5457447

Process and analysis of stochastic forest algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Process and analysis of stochastic forest algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Process and analysis of stochastic forest algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support