Machine learning--the randomforest--principle of stochastic forest algorithm and the realization of __ algorithm in Python

Source: Internet
Author: User

Reference:

http://blog.csdn.net/nieson2012/article/details/51279332

Http://www.cnblogs.com/wentingtu/archive/2011/12/22/2297405.html

Http://www.cnblogs.com/pinard/p/6156009.html

Algorithm Description:

1. Load data (training data and test data), assuming the total number of training sets is n.

2, to eliminate some of the data set features (useless features).

3, the forecast label from the data set temporarily removed.

4, set each selected feature number, such as each sample has m characteristics, each time we use only 2 features.

4, cycle to create each tree:

Randomly select 2 features from the M feature and add the tags back in

A new subset of the data is created from the training set with the extracted n samples, and the N sample contains only 3 features.

Create a tree with a subset of the data you create:

To slice a subset of data:

First, the initial Gini coefficient of the data subset is obtained.

For the 2 selected features:

For each of the values in the feature:

Splits the dataset based on the feature value.

The Gini coefficients of the data sets are computed based on the feature-value segmentation.

The Gini coefficients are reduced by the initial Gini coefficient minus the Gini coefficient obtained by dividing the dataset.

Record the maximum reduction, and obtain the corresponding segmentation features and eigenvalues

The reduced quantity and the characteristic value of the Gini are obtained, and if the reduction accords with the threshold value, the subset of the data is divided according to the characteristic and the characteristic value.

Recursive invocation of the result of data subset segmentation (creating a tree with a subset of the data created) this step.

Recursive completion completes the creation of a tree.

Save every tree and make up the forest.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.