Random Forest -- random forests

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

[Basic algorithm] Random forests

August 9, 2011

Random forest (s), a random forest, also called random trees [2] [3], is a joint prediction model composed of multiple decision trees, it can naturally be used as a fast and effective multiclass classification model. As shown in, each decision tree in RF consists of multiple splits and nodes: Split guides the output direction (left or right) through the input test value; node is the leaf node, determines the final output of a single decision tree. In the classification problem, the probability distribution of the class or the maximum probability class. In the regression problem, the function value is used. The output of the entire rt is jointly determined by multiple decision trees, argmax or avg.

Node Test
Node test is usually very simple, but a lot of Simple factors become very powerful together, and the combined prediction model is like this. Node test varies with applications. For example, the application of [1] is based on deep graph recognition of human body parts, and the node test is based on Pixel x Depth comparison test:

Simply put, it is to compare pixel X inUAndVWhether the depth difference of the pixel on the displacement is greater than a certain threshold value.UAndVThe displacement divided by the x Depth value is used to make the depth difference irrelevant to the depth of X and the distance between the human body and the camera. This node test is meaningless at first glance. In fact, it does not make much sense. The classification result of a single test may be just a little better than that of a random classification. But like the extremely weak features of the Haar feature, the key to the role lies in the subsequent boosting or bagging-the force of effective Union that can be combined.

Training
RF is a type of bagging model. Therefore, the general training process is similar to that of bagging. The key lies in the random selection of samples to avoid overfitting of the model. Each decision tree in RF is trained separately and is not associated with each other. For each decision tree, a sample subset is formed before training. In this subset, some samples may appear multiple times, while others may not appear once. Next, we use the sequential decision tree training algorithm to train a single decision tree for this sample subset.
The generation of a single decision tree follows the following process:
1) randomly generate a subset of samples;
2) split the current node as the left and right nodes. Compare all the available split nodes. Select the worker node;
3) Repeat (2) until the maximum node depth is reached, or the current node classification accuracy meets the requirements.
This process is greedy.
Of course, for different application scenarios, there will be detailed differences during the training process, such as the generation process of the Sample Subset and the definition of the optimal segmentation.
In [1], the actual sample of the decision tree is actually pixel X in the image, and the variable value is the node test mentioned above. However, for a fixed-size image, there are only a few pixels x that can be acquired (U,V) And the depth difference threshold is almost infinite. Therefore, [1] Before training a single decision tree, the random sample subset actually involves the random generation and displacement of pixel x sets (U,V) And the random generation of the combination of the depth difference threshold, and finally the random generation of the Training depth gallery itself.
The optimal split is usually defined as the classification that maximizes the Information Increment, as defined in [1:

H refers to entropy, which is calculated based on the tag distribution of split subsets.

Reference:
[1] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. finocchio, R. Moore, A. kipman, and A. Blake.
Real-time human pose recognition in parts from a single depth image. In cvpr 2011.
[2] L. breiman.Random forests. Mach. Learning, 45 (1): 5-32,200 1.
[3] T. Hastie, R. tibshirani, J. H. Friedman.The
Elements of Statistical Learning.ISBN-13 978-0387952840,200 3, Springer.
[4] v. lepetit, P. lagger, and P. FUA.Randomized trees for real-time keypoint Recognition. In Proc. cvpr, pages 2: 775-781,200 5

Transferred from http://lincccc.com /? P = 47

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Random Forest -- random forests

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support