Random Forest -- random forests

Source: Internet
Author: User
[Basic algorithm] Random forests

August 9, 2011

Random forest (s), a random forest, also called random trees [2] [3], is a joint prediction model composed of multiple decision trees, it can naturally be used as a fast and effective multiclass classification model. As shown in, each decision tree in RF consists of multiple splits and nodes: Split guides the output direction (left or right) through the input test value; node is the leaf node, determines the final output of a single decision tree. In the classification problem, the probability distribution of the class or the maximum probability class. In the regression problem, the function value is used. The output of the entire rt is jointly determined by multiple decision trees, argmax or avg.

Node Test
Node test is usually very simple, but a lot of Simple factors become very powerful together, and the combined prediction model is like this. Node test varies with applications. For example, the application of [1] is based on deep graph recognition of human body parts, and the node test is based on Pixel x Depth comparison test:


Simply put, it is to compare pixel X inUAndVWhether the depth difference of the pixel on the displacement is greater than a certain threshold value.UAndVThe displacement divided by the x Depth value is used to make the depth difference irrelevant to the depth of X and the distance between the human body and the camera. This node test is meaningless at first glance. In fact, it does not make much sense. The classification result of a single test may be just a little better than that of a random classification. But like the extremely weak features of the Haar feature, the key to the role lies in the subsequent boosting or bagging-the force of effective Union that can be combined.

Training
RF is a type of bagging model. Therefore, the general training process is similar to that of bagging. The key lies in the random selection of samples to avoid overfitting of the model. Each decision tree in RF is trained separately and is not associated with each other. For each decision tree, a sample subset is formed before training. In this subset, some samples may appear multiple times, while others may not appear once. Next, we use the sequential decision tree training algorithm to train a single decision tree for this sample subset.
The generation of a single decision tree follows the following process:
1) randomly generate a subset of samples;
2) split the current node as the left and right nodes. Compare all the available split nodes. Select the worker node;
3) Repeat (2) until the maximum node depth is reached, or the current node classification accuracy meets the requirements.
This process is greedy.
Of course, for different application scenarios, there will be detailed differences during the training process, such as the generation process of the Sample Subset and the definition of the optimal segmentation.
In [1], the actual sample of the decision tree is actually pixel X in the image, and the variable value is the node test mentioned above. However, for a fixed-size image, there are only a few pixels x that can be acquired (U,V) And the depth difference threshold is almost infinite. Therefore, [1] Before training a single decision tree, the random sample subset actually involves the random generation and displacement of pixel x sets (U,V) And the random generation of the combination of the depth difference threshold, and finally the random generation of the Training depth gallery itself.
The optimal split is usually defined as the classification that maximizes the Information Increment, as defined in [1:

H refers to entropy, which is calculated based on the tag distribution of split subsets.

Reference:
[1] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. finocchio, R. Moore, A. kipman, and A. Blake.
Real-time human pose recognition in parts from a single depth image. In cvpr 2011.
[2] L. breiman.Random forests. Mach. Learning, 45 (1): 5-32,200 1.
[3] T. Hastie, R. tibshirani, J. H. Friedman.The
Elements of Statistical Learning
.ISBN-13 978-0387952840,200 3, Springer.
[4] v. lepetit, P. lagger, and P. FUA.Randomized trees for real-time keypoint Recognition. In Proc. cvpr, pages 2: 775-781,200 5

 

Transferred from http://lincccc.com /? P = 47

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.