Process and analysis of GB and GBDT algorithms

Source: Internet
Author: User

1. Two strategies for optimizing the Model:

1) method based on residual error

Residuals are actually the difference between real and predicted values, in the process of learning, first learn a regression tree, and then the "real value-predictive value" to get the residuals, and then the residual as a learning target, learning the next tree, and so on, until the residual difference is less than a threshold of nearly 0 or the number of regression tree reached a threshold. The core idea is to reduce the loss function by fitting residuals each round.

In general, the first tree is normal, and then all the tree decisions are determined by the residuals.

2) Reduce the loss function using the gradient descent algorithm.

For the general loss function, in order to obtain the minimum value, through the gradient descent algorithm, each time moving toward the negative gradient direction of the loss function, the method of minimizing the loss function (this method requires the loss function can be guided).

2, GB (Gradient boosting) gradient lifting algorithm

GB is actually an algorithm framework, that is, the existing classification or regression algorithm can be put into it, a powerful algorithm to obtain a performance.

GB altogether needs m iterations, each iteration produces a model, we need to make each iteration of the model to the training set of the loss of the function of the minimum, and how to make the loss function less and less? We use the gradient descent method to reduce the loss function by moving to the negative gradient direction of the loss function at each iteration so that we can get more accurate models.

The gradient boost algorithm (GB) process is as follows: [1]

1) Initialize the loss function.

  

2) for the M-round iteration, when the m<=m Loop executes a) ~d) (m=1,2,..., m)

A) Calculate the residual RMI:

  

Calculate the negative gradient of the loss function in the current model value, as a residual estimate, for the square loss function it is residual, for the general loss function, it is the approximate value of residuals.

B) The rmj of the leaf node region of the M-class tree is obtained by the regression tree of the RMI quasi-unity. (j=1,2,..., J)

(Estimated regression leaf node area, fitting residuals approximation)

C) j=1,2,..., J, linear search for the minimum value of the loss function

  

D) update f (x)

  

3) Get a regression tree

  

The following is a Friedman of the GB algorithm in Daniel's thesis [6], the paper download link: Http://pan.baidu.com/s/1pJxc1ZH

Figure 2.1 Gradient boost algorithm [6]

3. GBDT (Gradient boosting decision tree): Algorithm of gradient elevation decision trees

Here we mainly discuss the multiple logistic regression problems.

Figure 3.1 Multi-class logistic regression algorithm [6]

regarding the above code, the Netizen already has the analysis: [5] (borrowed here)

"1. Represents the establishment of M tree (iterative M-times)

2. Represents a logistic transformation of the function estimate f (x)

3. For the K classification to do the following (in fact, this for loop can also be understood as a vector operation, each sample point Xi corresponds to K possible classification Yi, so Yi, F (xi), P (xi) are a k-dimensional vector, which may be easy to understand a little)

4. Gradient direction for reducing residuals

5. A decision tree consisting of J leaf nodes is obtained according to the gradient direction of each sample point x, which is reduced by its residual error.

6. For when the decision tree is established, the gain of each leaf node can be obtained by this formula (the gain is used in the prediction)

The composition of each gain is actually a k-dimensional vector, indicating if a sample point falls into the leaf node in the decision tree prediction, what is the value of the corresponding K-category?

7. The idea is to merge the currently obtained decision tree with the previous decision trees as a new model

Finally I GBDT research is not thorough enough, the next study clearly and then specifically write a GBDT article!!

Reference documents:

[1] Hangyuan Li, statistical learning methods.

[2] Heights field, machine learning techniques.

[3] Programmer's Home, http://www.programerhome.com/?p=3665

[4] Dianacody, http://www.dianacody.com/2014/11/01/GBRT.html

[5] Leftnoteasy, http://www.cnblogs.com/leftnoteasy/archive/2011/03/07/random-forest-and-gbdt.html

[6] Friedman J H. Greedy Function approximation:a Gradient boosting machine[c]//Annals of statistics1999:1189--1232.

Process and analysis of GB and GBDT algorithms

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.