Here is an example of a two-dollar classification that gives an explanation of the most basic principles
??
GBDT is the summation of the output predicted values of multiple trees
GBDT Tree is a regression tree , not a classification tree .
??
- Classification Tree
??
Split when choosing to make the most error drop
Techniques of calculation
The final split proceeds are calculated in the following way, noting that the part within the circle is a fixed value
- GBDT two categories
GBDT can be fully reused in the implementation of the above calculation method framework, but our optimization of the objective function is different.
used here is the exponential error function, whether the prediction is correct or wrong The error values are present, but the correct predictions causes the error value to be less than the wrong prediction Reference
AdaBoost and the Super Bowl of Classi ? ers
A Tutorial Introduction to Adaptive boosting
about common error functions Reference http://www.cnblogs.com/rocketfan/p/4083821.html
??
Reference Greedy Functon approximation:a Gradient boosting machine
the design of the error function of the second classification in 4.4 section
here is actually the same as given above, just added log (1 +, another 2,2yf)
??
This F - value is actually a logical regression idea, like ? Speech language processing book page interpretation, linear weighted value (output) Used to predict the ratio of P (True) and P (false) .The log value (the regression value is a real range value is not suitable for predicting 0-1, making a conversion ) , the closer true , then F (x) is closer to + infinity (corresponding to the maximum possible judgment true), the greater thep (false) ? so the closer-infinity (corresponds to the maximum possible judgment false)
??
F (X)? correspondence? feature X? The current regression forecast valuethat is, multiple trees have reached the output value of the leaf node by decision.output (x)the cumulative value. Na sample isF (x) Ndimensions, when there is no split, all samples are in one node and allF (x)corresponding to the same value, splitting the next two leaf nodesF (X)The correspondence can to different leaf nodes thereby there may be two different values.
calculation of the error function about the gradient of F, the error function is
variable is F (x)
??
After considering learning_rate is (@TODO)
F (x) corresponds to a sample in the leaf node corresponding to its feature X current predicted value
Reference machine learning probability angle the chapters of a book
??
??
Our division target from the above regression tree basic algorithm of the hope approximation y into the approximation gradient value r_im,
This means that the current tree is predicting negative gradient values.
F_m (x) = F_m-1 (x) + learning_rate* ( the predicted value of the current tree (that is , predicting negative gradients :))//@TODO Check
??
Compare The simplest gradient descent of NG courseware to the example of regression
??
the update strategy for each tree we use is for F (x) , while f (x) accumulates along the direction of the gradient, the goal is to make our
The error function is minimized.
The basic principle of GBDT