GBM and GBDT and Xgboost
Gradient Boost decision Tree is currently a very popular machine learning algorithm (supervised learning), this article will be from the origin of the GBDT, and introduce the current popular xgboost. In addition, "Adaboost detailed", "GLM (generalized linear model) and LR (logistic regression) detailed" is the basis of this paper. 0. Hello World
Here is a list of the simplest and most common GBDT algorithms.
For regression problems, the GBDT consists of a set of decision tree directly composed of ensemble F (x) =∑tt=1ft (x) f (x) =\sum_{t=1}^t f_t (x) to fit the target Y y.
Each round of iteration T T will look for a sub-function ft f_t integrated into F (x) f (x), as follows:
Argminft (x) (ft (x) −residual) 2, residual= (ft−1 (x) −y) arg \min_{f_t (x)} (f_t (x)-residual) ^2,\ residual= (F_{t-1} (x)-y)
Residual is the residual of the overall model after the end of the previous round, so GBDT will focus on the samples that were not processed in the previous iteration, step by step to refine the details to achieve better fit (and of course there are overfitting problems).
In fact, GBDT in fact contains a very wide range of ideas and applications, this article will be elaborated in detail. 1. Some Foundations
GBDT contains a number of common conceptual approaches to machine learning, and several important basic concepts are presented here. 1.1 Gradient Descent
Gradient descent is machine learning wellknown fashion, for an optimization target (loss function), we want to take a small step from the current position (model State), so that loss the fastest drop (decrease). Here is the first-order Taylor expansion of the loss function:
min| | v| | =1E (WT+ΗV) ≈e (WT) +ηv∇e (WT) \min_{| | v| | =1}e (W_t+\eta v) \approx E (w_t) +\eta V\nabla E (w_t)
Wt w_t means that the current weight vector v v indicates the direction of the walk (vector, with length constraint) when the T-round is iterated, and the step length (scalar, usually a small positive number) Η\eta represents the Walk
The goal is to find a loss function to drop the fastest direction v V, through other mathematical tools can prove that the optimal direction is negative gradient direction v=−∇e (