The calculation of feature importance of GBDT algorithm

Last Update:2018-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The calculation of the feature importance of the Tree ensemble algorithm

Integrated learning is widely concerned by the advantages of high predictive precision, especially the integrated learning algorithm using decision tree as the base learner. The tree's well-known code of integrated algorithms has random forests and GBDT. The random forest has a good resistance to overfitting, and the parameters (number of decision trees) have less effect on the prediction performance, and the parameter is easier to set up, and a larger number is generally arranged. GBDT has a very good theoretical foundation and generally has more advantages in performance. For the principle of the GBDT algorithm, please refer to my previous post, "GBDT algorithm theory in-depth analysis."

The tree-based integration algorithm also has a good feature, that is, the model after the end of the training can be used to output the relative importance of the characteristics of the model, it is easy for us to choose features, understand which factors are critical impact on the prediction, which in some areas (such as bioinformatics, neuroscience, etc.) is particularly important. This paper mainly introduces how the tree-based integrated algorithm calculates the relative importance of each feature. the advantage of using boosted tree as a learning algorithm: when using different types of data, it is easy to balance runtime efficiency and accuracy without having to do feature normalization/normalization; For example, using boosted Tree as a model of on-line prediction can cut down the number of trees participating in the forecast when the machine resource is tense, so that the predictive efficiency learning model can output the relative importance of the feature, and can be used as a feature selection model to be interpreted to be insensitive to data field deletions. The ability to automatically interaction between multi-group features has a very good calculation of non-linear feature importance

Friedman's approach in GBM's paper:

The global importance of the characteristic J J is measured by the average of the importance of the characteristic J-J in a single tree:
j2j^=1m∑m=1mj2j^ (Tm) \hat{j_{j}^2}=\frac1m \sum_{m=1}^m\hat{j_{j}^2} (t_m)
where M is the number of trees. The importance of feature J J in a single tree is as follows:
j2j^ (t) =∑t=1l−1i2t^1 (VT=J) \hat{j_{j}^2} (t) =\sum\limits_{t=1}^{l-1} \hat{i_{t}^2} 1 (v_{t}=j)
where l l is the number of leaf nodes of the tree, l−1 L-1 is the number of non-leaf nodes of the tree (the constructed tree is a two-fork tree with left and right children), VT V_{t} is a feature associated with the node T T, and i2t^ \hat{i_{t}^2} is the reduced value of the square loss after the node T-T Division.

Implementing Code Snippets

To better understand the computational method of feature importance, the implementation in the Scikit-learn Toolkit is given below, and the code removes some unrelated parts.

The following code is derived from the calculation method for the Feature_importances property of the Gradientboostingclassifier object:

def feature_importances_ (self):
    total_sum = Np.zeros ((self.n_features,), Dtype=np.float64) for
    tree in Self.estimators_:
        total_sum + = Tree.feature_importances_ 
    importances = Total_sum/len (Self.estimators_)
    return importances

Among them, Self.estimators_ is an array of decision trees constructed by the algorithm, and Tree.feature_importances_ is the characteristic importance vector of a single tree, which is calculated as follows:

Cpdef compute_feature_importances (Self, Normalize=true): "" "computes the importance of each
    feature (aka variable)." " "While

    node! = End_node:
        if Node.left_child! = _tree_leaf:
            # ... and node.right_child! = _tree_leaf:
            lef t = &nodes[node.left_child] Right
            = &nodes[node.right_child]

            importance_data[node.feature] + = (
                Node.weighted_n_node_samples * node.impurity-
                left.weighted_n_node_samples * left.impurity-
                right.weighted _n_node_samples * right.impurity)
        node + = 1

    importances/= nodes[0].weighted_n_node_samples

    return Importances

The code above has been simplified to preserve the core idea. The reduction of the weighted purity of all non-leaf nodes during splitting is calculated, and the more important the characterization is.

The reduction in purity is actually the benefit of this split of the node, so we can also understand that the greater the yield when the node splits, the higher the importance of the corresponding characteristics of the node. For a definition of revenue please refer to the definition of equation (9) in my previous blog post, "GBDT algorithm theory in Depth". References

[1] Feature Selection for Ranking using Boosted Trees
[2] Gradient Boosted Feature Selection
[3] Feature Sele Ction with ensembles, Artificial Variables, and redundancy elimination
[4] GBDT algorithm in layman's principles

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The calculation of feature importance of GBDT algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The calculation of feature importance of GBDT algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support