Random forest--random forests

Source: Internet
Author: User

[Basic algorithm] Random forests

August 9, 2011

Random Forest (s), stochastic forest, also called Random trees[2][3], is a combined prediction model composed of multiple decision trees, which can be used as a fast and effective multi-class classification model. As shown, each decision tree in RF consists of a number of split and node: Split directs the output direction (left or right) by the input test value, and node is the leaf node, which determines the final output of the single decision tree, the probability distribution of the genus in the classification problem, or the maximum probability class genus, The function is evaluated in the regression problem. The output of the entire RT is determined by a number of decision trees, Argmax or Avg.

Node Test
Node test is usually very simple, but many of the simple twists and turns are incredibly powerful, and the joint predictive model is something like that. Node test is different from the app. For example [1] The application is based on the depth map of human body parts recognition, the use of node test is based on the depth of the pixel x comparison test:


To put it simply, the difference in the depth of the pixel of Pixel x on the u and v displacements is greater than a certain threshold value. The u and v displacements are divided by the X depth value to make the depth difference independent of the depth of the x itself, regardless of the distance the body is from the camera. This node test is meaningless at first glance, and in fact it doesn't make much sense, and the result of a single test can be just a little more than a random classification. But just as the Haar feature is a very weak feature, the key to work is the strength of subsequent boosting or bagging--effective unions that can be combined.

Training
RF belongs to the bagging class model, so the general training process and bagging similar, the key is the random selection of samples to avoid the overfitting problem of the model. Each decision tree in RF is trained separately and is not associated with each other. For each decision tree, a subset of samples is formed before training, and some samples may appear multiple times in this subset, while others may not occur at one time. Next, is the sequential decision tree Training algorithm, for the sample subset of the single decision tree training.
The creation of a single decision tree follows roughly the following process:
1) randomly generate a subset of samples;
2) Divide the current node into the left and right nodes, compare all the optional splits, select the best person;
3) repeat 2) until the maximum node depth is reached, or the current node classification accuracy is met.
This process is greedy.
Of course, for different applications, there will be differences in the details of the training process, such as the generation of sample subsets and the definition of optimal segmentation.
In [1], the actual sample of the decision tree is actually the pixel x in the image, and the value of the variable is the node test mentioned above. However, for a fixed-size picture, the desirable pixel x is a large number, the desirable displacement (u,v) and depth difference threshold is almost infinite. Therefore, [1] in the training of a single decision tree, the sample subset to be done in fact involves random generation of the x set of pixels, displacement (u,v) and depth difference threshold combination of random generation, and finally the training depth map of the collection itself randomly generated.
Optimal splitting is typically defined as the classification that maximizes information increment, such as the definition in [1]:

H refers to entropy, which is calculated by the distribution of the parts of the split subset.

Reference:
[1] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and a. Blake. real-time Human Pose recognition in Parts from a single Depth Image. In CVPR 2011.
[2] L. Breiman. Random forests. Mach. Learning, 45 (1): 5–32, 2001.
[3] T. Hastie, R. Tibshirani, J. H. Friedman. The Elements of statistical learning . ISBN-13 978-0387952840, 2003, Springer.
[4] v. Lepetit, p. Lagger, and P. Fua. randomized trees for real-time keypoint recognition. In Proc. CVPR, pages 2:775–781, 2005

Transfer from http://lincccc.com/?p=47

from:http://blog.csdn.net/yangtrees/article/details/7488937

Random forest--random forests

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.