Xgboost parameters
Before you run the Xgboost program, you must set three types of parameters: Common type parameters (general parameters), booster parameters, and Learning task parameters (task parameters).
General parameters– parameters of generic type parameters determine which booster to use in the process of Ascension, Common booster have tree models and linear models.
Booster parameter-The setting of this parameter depends on which booster model we choose.
Learning task parameters The setting of the task parameters-parameter determines which learning scenario, for example, the regression task uses different parameters to control the sort task.
Command-Line arguments-General and Xgboost CL versions are associated.
Booster Parameters:
1. eta[default is 0.3] and learning rate parameters in GBM are similar. The robustness of the model can be improved by reducing the weight of each step. The typical value 0.01-0.2
2. min_child_weight[default is 1] determines the minimum leaf node sample weight and. When its value is large, the model can be avoided to learn local special samples. However, if this value is too high, it will result in a lack of fit. This parameter requires a CV to adjust the maximum depth of the
3 max_depth [default is 6] tree, which is also used to avoid fitting 3-10
4. The number of the largest nodes or leaves on the max_leaf_nodes tree can replace the max_depth, Should be a two-forked tree if generated, a tree with a depth of n can produce a maximum of 2n leaves, and if this parameter is defined max_depth is ignored
5. gamma[default is 0] when a node splits, only the value of the loss function after splitting is dropped, and the node is split. Gamma Specifies the minimum loss function drop value required for node splitting. The larger the parameter value, the more conservative the algorithm is.
6. Max_delta_step[default is 0] This parameter limits the maximum step size per tree weight change. If 0 means there is no constraint. If the value is positive then the algorithm is more conservative and usually does not need to be set.
7. Subsample[default is 1] This parameter controls the proportion of random sampling for each tree. The value algorithm that reduces this parameter is more conservative and avoids fitting. However, this value is set too small, and it may cause a lack of fit. Typical value: 0.5-1
8. colsample_bytree[default is 1] to control the number of randomly sampled columns per tree, each column is a feature 0.5-1
9. colsample_bylevel[default is 1] to control every division of each level , the percentage of the number of columns sampled. The
lambda[default is 1] The L2 of the weights
alpha[The default is 1] The L1 regularization of the weights
The default is 1] when all kinds of samples are very unbalanced, set this parameter to a positive number and you can Make the algorithm faster convergent.
General parameters:
1. booster[default is Gbtree]
There are two options for choosing a model for each iteration: Gbtree model based on tree, Gbliner linear model
2. silent[default is 0]
When the value of this parameter is 1, silent mode is turned on and no information is printed. Generally this parameter keeps the default of 0, which helps us to better understand the model.
3. nthread[default value is the maximum possible number of threads]
This parameter is used for multithreading control, should enter the system's kernel number, if you want to use all the cores of the CPU, do not enter this parameter, the algorithm will automatically detect.
Learning Target Parameters:
1. objective[default is Reg:linear]
This parameter defines the loss function that needs to be minimized. The most commonly used values are: the logical regression of the binary:logistic two classification, and the probability of return prediction is not classified. Multi:softmax uses Softmax's multiple classifiers to return the predicted category. In this case, you have to set one more parameter: The number of num_class categories.
2. eval_metric[default value depends on objective parameter's fetch]
A measure of valid data. For regression problems, the default value is Rmse, and for classification problems, the default is error. The typical values are: Rmse mean square root error, mae mean absolute error, logloss negative logarithm likelihood function value, error two classification error rate, merror multiple classification error rate, Mlogloss multiple classification loss function, and area under AUC curve.
3. seed[default is 0]
Random number of seeds, set it can reproduce the results of random data, can also be used to adjust parameters.