Understanding the GBDT Algorithm (ii)--based on the version of the residuals

Source: Internet
Author: User

The GBDT algorithm has two description ideas, one based on the residuals, and one based on the gradient gradient version. Let's first talk about the version based on the residuals.

The previous blog post has already said the approximate principle of this version, please refer to.
http://blog.csdn.net/puqutogether/article/details/41957089

In this article we summarize a few points of note:

  • This version of the core idea: each regression tree to learn the residuals of the front tree, and with shrinkage to learn the results of the big step into small steps, and constantly iterative learning. The cost function is a common mean variance.
  • Its basic practice is: first learn a regression tree, and then "real value-predictive value *shrinkage" to seek the residual error at this time, the residual as the target value, learning the next regression tree, continue to seek residual ... Until the number of regression trees established meets certain requirements or residuals can tolerate, stop learning.
  • We know that the residuals are the difference between the predicted value and the target value, and this version is the absolute direction to learn the residuals as the global best.
  • This version is more suitable for regression problems, both linear and non-linear, and can be categorized after thresholds are set.
  • This version used residuals and it was difficult to deal with problems other than pure regression. The use of gradients in version two, as long as the established cost function can be derivative, then you can use version two of the GBDT algorithm, such as the Lambdamart learning sorting algorithm.
  • The relationship of learning step alpha in shrinkage and gradient descent method. Shrinkage set small will only make learning slower, set large is equal to not set, it is suitable for all incremental iterative solution problem, and the gradient step size is easy to fall into the local optimal point, set large easy not convergence. It is only used to solve with gradient descent. The two don't really matter much.

Understanding the GBDT Algorithm (ii)--based on the version of the residuals

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.