Parameter interpretation of Xgboost

Parameter interpretation of Xgboost _xgboost

Last Update:2018-08-22 Source: Internet

Author: User

Tags square root xgboost

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Xgboost parameters

Before you run the Xgboost program, you must set three types of parameters: Common type parameters (general parameters), booster parameters, and Learning task parameters (task parameters).
General parameters– parameters of generic type parameters determine which booster to use in the process of Ascension, Common booster have tree models and linear models.
Booster parameter-The setting of this parameter depends on which booster model we choose.
Learning task parameters The setting of the task parameters-parameter determines which learning scenario, for example, the regression task uses different parameters to control the sort task.
Command-Line arguments-General and Xgboost CL versions are associated.

Booster Parameters:
1. eta[default is 0.3] and learning rate parameters in GBM are similar. The robustness of the model can be improved by reducing the weight of each step. The typical value 0.01-0.2
2. min_child_weight[default is 1] determines the minimum leaf node sample weight and. When its value is large, the model can be avoided to learn local special samples. However, if this value is too high, it will result in a lack of fit. This parameter requires a CV to adjust the maximum depth of the
3 max_depth [default is 6] tree, which is also used to avoid fitting 3-10
4. The number of the largest nodes or leaves on the max_leaf_nodes tree can replace the max_depth, Should be a two-forked tree if generated, a tree with a depth of n can produce a maximum of 2n leaves, and if this parameter is defined max_depth is ignored
5. gamma[default is 0] when a node splits, only the value of the loss function after splitting is dropped, and the node is split. Gamma Specifies the minimum loss function drop value required for node splitting. The larger the parameter value, the more conservative the algorithm is.
6. Max_delta_step[default is 0] This parameter limits the maximum step size per tree weight change. If 0 means there is no constraint. If the value is positive then the algorithm is more conservative and usually does not need to be set.
7. Subsample[default is 1] This parameter controls the proportion of random sampling for each tree. The value algorithm that reduces this parameter is more conservative and avoids fitting. However, this value is set too small, and it may cause a lack of fit. Typical value: 0.5-1
8. colsample_bytree[default is 1] to control the number of randomly sampled columns per tree, each column is a feature 0.5-1
9. colsample_bylevel[default is 1] to control every division of each level , the percentage of the number of columns sampled. The
lambda[default is 1] The L2 of the weights
alpha[The default is 1] The L1 regularization of the weights
The default is 1] when all kinds of samples are very unbalanced, set this parameter to a positive number and you can Make the algorithm faster convergent.

General parameters:
1. booster[default is Gbtree]
There are two options for choosing a model for each iteration: Gbtree model based on tree, Gbliner linear model
2. silent[default is 0]
When the value of this parameter is 1, silent mode is turned on and no information is printed. Generally this parameter keeps the default of 0, which helps us to better understand the model.
3. nthread[default value is the maximum possible number of threads]
This parameter is used for multithreading control, should enter the system's kernel number, if you want to use all the cores of the CPU, do not enter this parameter, the algorithm will automatically detect.

Learning Target Parameters:
1. objective[default is Reg:linear]
This parameter defines the loss function that needs to be minimized. The most commonly used values are: the logical regression of the binary:logistic two classification, and the probability of return prediction is not classified. Multi:softmax uses Softmax's multiple classifiers to return the predicted category. In this case, you have to set one more parameter: The number of num_class categories.
2. eval_metric[default value depends on objective parameter's fetch]
A measure of valid data. For regression problems, the default value is Rmse, and for classification problems, the default is error. The typical values are: Rmse mean square root error, mae mean absolute error, logloss negative logarithm likelihood function value, error two classification error rate, merror multiple classification error rate, Mlogloss multiple classification loss function, and area under AUC curve.
3. seed[default is 0]
Random number of seeds, set it can reproduce the results of random data, can also be used to adjust parameters.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More