1. Background
On the principle of xgboost network resources are very few, most still stay at the application level, this article through the study of Dr. Chen Tianchi's PPT address and xgboost guidance and actual address, I hope the principle of xgboost in-depth understanding.
2.xgboost vs GBDT
Speaking of Xgboost, I have to say GBDT. Learn GBDT can read my article address, GBDT both theoretical derivation and application scenario practice is quite perfect, but there is a problem: the nth tree training, need to use the first n-1 tree (approximate) residuals. From this point of view, GBDT more difficult to achieve distributed (PS: Although difficult, it is still possible to think on a different angle), and xgboost from the following point of view
NOTE: The red Arrow points to L which is the loss function, the red box is a regular item, including L1, L2, and the red circle is a constant term.
Using Taylor to expand three and make an approximation, we can clearly see that the final objective function relies only on the first derivative and second derivative of each data point on the error function.
3. Principle
(1) Define the complexity of the tree
For the definition of F to do a refinement, split the tree into structural part Q and leaf weight part w. is a concrete example. The structure function Q maps the input to the index number of the leaf, and w gives the leaf fraction of each index number.
This complexity is defined to include the number of nodes in a tree, and the L2 modulus squared of the output fraction of each tree leaf node. Of course, this is not the only way to define, but this definition of the way to learn the tree effect is generally quite good. An example of complexity calculation is also given.
Note: The box section controls the proportion of this part in the final model formula
In this new definition, we can rewrite the target function as follows, where I is defined as the sample set on each leaf
This goal consists of a single variable two-time function of TT, which is independent of each other. We can define
The final formula can be simplified to
By the derivative equal to 0, you can get
Then the optimal solution is given:
(2) Example of scoring function calculation
obj is a representation of how much we reduce the target when we specify the structure of a tree. We can call it structural fractions (structure score)
(3) The greedy method of enumerating different tree structures
Greedy method: Every attempt to add a split to the existing leaves
For each extension, we still enumerate all possible tessellation schemes, how to efficiently enumerate all the splits? I assume we want to enumerate all the conditions of x < A, for a particular split a we want to calculate the derivative of a left and right side of a and.
We can see that for all a, we just have to do a left-to-right scan to enumerate all the split gradients and GL and Gr. Then use the above formula to calculate the score of each partition scheme.
Looking at this objective function, you will find that the second notable thing is that the introduction of segmentation does not necessarily make things better, because we have a penalty for introducing new leaves. Optimization This target corresponds to the pruning of the tree, and when the gain of the introduced partition is less than a threshold, we can cut off the partition. As you can see, when we formally derive our goals, strategies like calculating fractions and pruning will naturally occur, rather than being a heuristic (heuristic) operation.
4. Custom loss function
In the actual business scenario, we often need to customize the loss function. Here is an official link address
5.Xgboost Tuning Parameters
The use of Gridsearch is particularly time-consuming due to the xgboost parameters. Here you can learn the following article, and teach you how to get the parameters step to step. Address
6.python, R for simple use of xgboost
Tasks: Two classification, there is a sample imbalance problem (Scale_pos_weight can understand this problem to some extent)
"Python"
R
Introduction to the more important parameters in 7.xgboost
(1) objective [Default=reg:linear] defines the learning task and the corresponding learning goal, the optional objective function is as follows:
- "Reg:linear" – Linear regression.
- "Reg:logistic" – Logistic regression.
- "Binary:logistic" – The logistic regression problem of the two classification, the output is probability.
- "Binary:logitraw" – The logistic regression problem for the two classification, the output is WTX.
- "Count:poisson" – The Poisson regression of the counting problem, the output is Poisson distribution. In Poisson regression, the default value for Max_delta_step is 0.7. (used to safeguard optimization)
- "Multi:softmax" – allows Xgboost to use Softmax objective function to handle multi-classification problems, and to set parameter Num_class (number of categories)
- "Multi:softprob" – like Softmax, but the output is a vector of ndata * nclass, which can be reshape into a matrix of ndata rows nclass columns. No row of data represents the probability that a sample belongs to each category.
- "Rank:pairwise" –set xgboost to does ranking task by minimizing the pairwise loss
(2) ' eval_metric ' The choices is listed below, evaluation index:
- "Rmse": Root mean square error
- "Logloss": Negative Log-likelihood
- "Error": Binary classification Error rate. It is calculated as # (wrong cases)/# (all cases). For the predictions, the evaluation would regard the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances.
- "Merror": Multiclass classification Error rate. It is calculated as # (wrong cases)/# (all cases).
- "Mlogloss": Multiclass Logloss
- "AUC": area under, the curve for ranking evaluation.
- "NDCG": Normalized discounted cumulative Gain
- "Map": Mean average precision
- "[Email protected]", "[email protected]": N can be assigned as a integer to cut off the top positions in the lists for Eva Luation.
- "ndcg-", "map-", "[email protected]", "[email protected]": In Xgboost, NDCG and map would evaluate the score of a list without Any positive samples as 1. By adding "–" in the evaluation metric Xgboost would evaluate these score as 0 to be consistent under some conditions.
(3) lambda [default=0] The penalty coefficient of L2 regular
(4) Alpha [default=0] The penalty coefficient of L1 regular
(5) the L2 of the Lambda_bias on the bias. The default value is 0 (there is no bias on L1, because bias is not important when L1)
(6) ETA [default=0.3]
In order to prevent overfitting, the shrinkage step used in the update process. After each elevation calculation, the algorithm directly gains the weight of the new feature. ETA increases the computational process more conservatively by reducing the weight of the feature. The default value is 0.3
The value range is: [0,1]
(7) The maximum depth of max_depth [default=6] . The default value is 6, and the value range is: [1,∞]
(8) min_child_weight [Default=1]
The smallest sample weight in the child node and. If a leaf node has a sample weight and is less than min_child_weight, the split process ends. In the current regression model, this parameter refers to the minimum number of samples required to build each model. The larger the mature algorithm, the more conservative
The value range is: [0,∞]
More about Xgboost Learning address
Principle and application of xgboost