Full guide to xgboost parameter tuning (Python code included)

Source: Internet
Author: User
Tags xgboost

Xgboost parameter Tuning Complete guide (Python code included): http://www.2cto.com/kf/201607/528771.html

https://www.zhihu.com/question/41354392

"The following turns to self-knowledge" https://www.zhihu.com/question/45487317 why XGBOOST/GBDT in the parameter when the depth of the tree can be very little to achieve high precision?

When participating in the Kaggle, use the XGBOOST/GBDT to tune the maximum depth of the tree to 6 when adjusting the parameters to a high precision. But when using decisiontree/randomforest, the depth of the tree needs to be tuned to 15 or higher.
The depth of the tree required by Randomforest is as decisiontree as I can understand it, as it is a combination of decisiontree with the bagging method, which is equivalent to doing many decisiontree.
But XGBOOST/GBDT can achieve high predictive accuracy with 6-node depth using only a gradient-rise method? Uffy
Links: https://www.zhihu.com/question/45487317/answer/99153174
Source: Know
Copyright belongs to the author, please contact the author for authorization.

A word of explanation from Zhou Zhihua Teacher's machine learning textbook (machine learning-Zhou Zhihua): Boosting mainly focuses on reducing deviations, so boosting can build strong integration based on learners with fairly weak generalization performance; bagging focuses on reducing variance, so it's not pruning in decision trees , and neural networks are more effective in learning.

Random forests (forest) and GBDT are all part of the Integrated Learning (Ensemble learning) category. There are two important strategies bagging and boosting under integrated learning.

The bagging algorithm does this: each classifier randomly samples the samples from the original sample, then trains the classifier on the sampled samples, and then combines the classifiers. A simple majority vote is generally possible. Its representative algorithm is random forest. Boosting means that he trains a series of classifiers iteratively, and the sample distributions used by each classifier are related to the previous round of learning results. Its representative algorithm is AdaBoost, GBDT.

In fact, for the machine learning algorithm, its generalization error can be decomposed into two parts, deviation (bias) and variance (variance). This can be derived from (using the probability theory Formula D (x) =e (x^2)-[e (x)]^2). Deviation refers to the degree of deviation between the expected prediction of the algorithm and the real prediction, which reflects the fitting ability of the model itself; Variance measures the change of the training set of the same size, which results in the changes of learning performance, and depicts the effect of data disturbance. It's a bit of a detour, but you must have known about fitting.

As shown, the more complex the model, the higher the degree of fitting, and the less the training deviation of the model. At this point, however, if you change a set of data, the model will vary greatly, that is, the variance of the model is large. So when the model is too complex it can lead to overfitting.
When the model is simpler, even if we change a group of data, the difference between the last learner and the previous learner is not so great, the variance of the model is very small. Or because the model is simple, so the deviation will be very large.

That is to say, when we train a model, deviations and variances are taken care of, and no one is missing.
For the bagging algorithm, because we will train many different classifiers in parallel, the goal is to reduce this variance (variance), because the use of a separate base classifier after more, hThe value will naturally be close. So for each base classifier, the goal is how to reduce this deviation (bias), so we will adopt deep or not pruning decision tree.

For boosting, each step we will be on the basis of the previous round to more fit the original data, so we can guarantee the deviation (bias), so for each base classifier, the problem is how to choose variance smaller classifier, a simpler classifier, So we chose a decision tree with a very shallow depth.

Full guide to xgboost parameter tuning (Python code included)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.