Machine learning Techniques-random forest (Forest)

Source: Internet
Author: User
Tags shuffle

Course Address: Https://class.coursera.org/ntumltwo-002/lecture

Important! Important! Important ~

I. Random Forest (RF)

1.RF Introduction

    • RF combines many of the cart in a bagging way, regardless of the computational cost, usually the more trees the better.
    • The use of the cart in RF does not undergo pruning operations, generally there will be a large deviation (variance), combined with the average effect of bagging can reduce the deviation of the cart.
    • At the time of training the cart, the randomness and diversity of G (t) can be increased by using randomly sampled samples (bootstraping), randomly sampled features, and even the sample features through mapping matrix p projection to random subspace.

2.RF algorithm Structure and benefits

Two, OOB (Out-of-bag) and self-validating (Automatic Validation)

The sampling method (Bootstrapping) used in 1.RF can result in some samples not being used in a training session, and samples that are not used are called OOB (out-of-bag).

When the sample set is large, if the size of the training data is the same as the size of the sample collection, then the probability that a sample is not used is approximately 1/3,oob size and about 1/3 of the sample set, and the following is a concrete mathematical description.

2.RF Validation

RF does not pay attention to the classification effect of each tree, nor does it actually validate G (t) with OOB data, but instead uses OOB data to validate G.

But at the same time in order to ensure that the validation data is never "peeping" during training, the G used is to remove G (t) consisting of the test's OOB-related.

Finally, all the OOB test results are averaged. "In practice, Eoob are usually very accurate," Lin said.

Iii. Feature Selection (Feature Selection) and permutation test (permutation test)

    • In practice, when there are very many characteristics of the sample, it is sometimes desirable to remove redundant or unrelated feature items and select relative important feature items.
    • In linear models, the importance of feature items is used | Wi|, it is generally difficult to measure the importance of feature items in a non-linear model.
    • The instrumental permutation test (permutation test) in the use of statistics in RF is used to measure the importance of feature items.
    • n samples, D dimensions per sample, in order to measure the importance of one of the features di, according to permutation test the N sample of the di features are shuffled shuffle, shuffle before and after the error subtraction is the importance of this feature.
    • RF often does not use permutation Test during training, but instead disrupts the OOB feature items when validation, and then evaluates the validation to get the importance of the feature item.

IV. Application of RF

    • On a simple data set, the RF model boundary is smoother and the confidence interval (Margin) is larger than the single cart tree.
    • In complex and noisy datasets, decision trees are often poorly performed, RF has good noise reduction, and RF models behave well in comparison.
    • How many trees does RF choose? Overall is the more the better!!! In practice, it is necessary to use enough trees to ensure the stability of G, so the stability of G can be used to determine how many trees are good.

Machine learning techniques-random forest (Forest)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.