Machine learning: Using random forests to select features

Source: Internet
Author: User

??

Introduction

??

Before the decision tree in the selection of the best characteristics of the data set of the partition is said that this method can be used for feature selection, and then read the Breiman home page related to the introduction, think this is worthy of authority Ah, is worthy of random forest algorithm proposed, speak very clearly, the URL is as follows

??

Http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

??

??

Feature importance

??

The importance of a feature X in a random forest is calculated as follows:

??

First, for each decision tree in the random forest , the corresponding OOB ( out-of-pocket data ) data is used to calculate its out-of-pocket data error , which is recorded as errOOB1. So that each decision tree can get a errOOB1,k tree Decision tree by K errOOB1

??

Then is to traverse all the features to examine the importance of this feature, the way to examine the importance of random data out of the OOB all samples of the feature x added noise interference ( can be understood as a random change in the sample in the feature X value ), the outside of the bag data error is calculated again , recorded as errOOB2. So that each decision tree can get a errOOB2,k tree Decision tree by K errOOB2

??

The reason why this expression can be used as a measure of the importance of the corresponding feature is that if a feature is randomly added to the noise , the accuracy of the outside of the bag is significantly reduced , then this characteristic has a great influence on the classification result of the sample. In other words, it is of high importance.

??

So for the importance of feature X =∑(erroob2-erroob1)/ktree,

Machine learning: Using random forests to select features

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.