The GBDT algorithm has two description ideas, one based on the residuals, and one based on the gradient gradient version. Let's first talk about the version based on the residuals.
The previous blog post has already said the approximate principle of this version, please refer to.
http://blog.csdn.net/puqutogether/article/details/41957089
In this article we summarize a few points of note:
- This version of the core idea: each regression tree to learn the residuals of the front tree, and with shrinkage to learn the results of the big step into small steps, and constantly iterative learning. The cost function is a common mean variance.
- Its basic practice is: first learn a regression tree, and then "real value-predictive value *shrinkage" to seek the residual error at this time, the residual as the target value, learning the next regression tree, continue to seek residual ... Until the number of regression trees established meets certain requirements or residuals can tolerate, stop learning.
- We know that the residuals are the difference between the predicted value and the target value, and this version is the absolute direction to learn the residuals as the global best.
- This version is more suitable for regression problems, both linear and non-linear, and can be categorized after thresholds are set.
- This version used residuals and it was difficult to deal with problems other than pure regression. The use of gradients in version two, as long as the established cost function can be derivative, then you can use version two of the GBDT algorithm, such as the Lambdamart learning sorting algorithm.
- The relationship of learning step alpha in shrinkage and gradient descent method. Shrinkage set small will only make learning slower, set large is equal to not set, it is suitable for all incremental iterative solution problem, and the gradient step size is easy to fall into the local optimal point, set large easy not convergence. It is only used to solve with gradient descent. The two don't really matter much.
Understanding the GBDT Algorithm (ii)--based on the version of the residuals