Tuning machine learning Algorithms

Last Update:2018-07-26 Source: Internet

Author: User

Tags square root svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Machine learning algorithms are numerous, and various algorithms involve more parameters, this article will briefly introduce the RF,GBDT and other algorithms of tuning experience and steps. 1. BP

Tuning matters
1.BP is sensitive to feature scaling, first scale data.
2. Experience shows that L-bfgs converges faster on small data, and Adam works well on Big data, and SGD adjusts well on the parameter learning rate better.

Parameter adjustment
1. First, the number of hidden layers. In general, if the problem is linearly divided, then there is no hidden layer on the line; For most problems, an implicit layer works well, and multiple hidden layers are more difficult to train (you need to use self-coding or RBM to pre-train and training parameters grow quickly).
2. Next determine the number of neurons in the hidden layer, in general, the number of neurons in the output layer and the number of neurons in the output layer is better. The correct number of neurons can be determined by CV. It is also possible to increase the number of neurons and then reduce the number of neurons by pruning. The principle of pruning is that when the neural network is trained to actively remove the neurons that are useless, the characteristic of the neuron is that the W is very small, so we can get the neural network to look at the weight matrix of W and decide whether to delete some neurons.
3. Draw the learning curve.
4. If the learning curve is not stable, then reduce the learning rate; if the learning curve changes very slowly, then increase the learning rate.
5. If the learning curve has been found to fit, then terminate or increase the regular coefficient or reduce the complexity of the model prematurely; If the learning curve is not fitted, then many iterations or decreases the regularization factor or increases the complexity of the model. 2.LR

Parameters

1.C: The reciprocal of the regular term λ, the smaller the model the simpler the more.
2.learning_rate

Tuning matters
Logisticregression in 1.sklearn does not use SGD, with some solver, so there is no learning_rate.
The 2.sklearn Sgdclassifier is suitable for the linear model (SVM, logistic regression) with SGD optimization, the parameters of the learning_rate and ALPHA,SGD faster.
With LOGISTICREGRESSIONCV in 3.sklearn, faster CV can be achieved. An algorithm suitable for adjusting the penalty weights of regular items.

Parameter adjustment
Grid search Cross validation 3.SVM

Parameters
The larger the gamma, the smaller the effect of a single training data (SV) (which can only be affected by proximity to SV), affects only the small area around it, which may require more SV and increased model complexity.
The larger the C, the more complex the model has to become and the greater the complexity of the model in order to make mistakes.

1.γ: Single Training data (SV) affects the distance size, since the Gaussian core is exp (−γ| | x−xn| | 2), so the smaller the gamma, the greater the impact of the individual training data (SV). Γ can be seen as the reciprocal of the SV's influence radius.
The 2.C:C parameter makes a tradeoff between the complexity of the model and the correct rate of the model. If C is large, then the model may become complex to minimize classification errors, may be overfitting, and if C is small, the model may be so simple that it does not work well on the training set.

Tuning matters
The 1.10−3 to 103 log grid has sufficient tuning range
2. The model is very affected by the gamma parameter, when Γ is large, then the impact radius of SV is very small, can only affect its own, so it is easy to cross-fit. In this case, even changing C is useless.
3. When Γ is small, the model is too limited to be able to capture complex shapes. The impact radius of each SV is enlarged to include the entire training set, so the model is simple, for the same reason that K is very large in KNN.
The optimal combination of 4.γ,c generally on the diagonal of the distribution, select a smoother model (low gamma), by increasing the number of SV (high C) can achieve good results.
5.γ not too big, otherwise the C is useless.

Parameter adjustment
Grid search Cross validation 4.RF

Parameters
The RF parameters mainly involve two parts:

1. Tree parameters: Max depth, Min_samples_split, Min_samples_leaf, Max_features
2.bagging parameter: n_estimators

Tuning matters
The sub-models of the 1.RF model all have lower bias (tree depth to the head), and the overall model aims to reduce variance, so the number of trees needs to be increased. While reducing the correlation between sub-models helps reduce variance, the value of max_feature is generally set to the square root of the number of features.
2.max_features default sqrt, generally set to a smaller number. Reducing max_features is helpful to reduce the correlation of different trees, reduce the variance of the final model, but increase the bias of the single model.
The choice of 3.max_features is also related to the quality of the data, and if the quality of the data is poor, small max_features may choose the wrong feature when split, so it is likely to affect the final model.

Parameter adjustment
1.max_features is the primary parameter for parameters, about 300 trees, when the best max_features is found by CV.
2. Adjust Max_depth
3. Adjust the n_estimators, generally the larger the better, adjust to the validation set on the accuracy of the increase is not much. 5.GBDT

Parameters
There are three main parameters to the GBDT design:

1. Tree parameters: Max_depth (max_leaf_nodes), Min_samples_split (min_weight_fraction_leaf), Min_samples_leaf, Max_features
2.boost parameters: Learning_rate, n_estimators, subsample
3. Other parameters: Loss, init, random_state, verbose, Warm_start, presort
The parameters for general adjustment are:

1. Tree parameters: Max_depth, Min_samples_split, Min_samples_leaf, Max_features
2.boost parameters: Learning_rate, n_estimators

Tuning matters
1. Compared to RF, GBDT not only reduces the variance (combination of multiple trees), but also lowers the bias (to train the wrong data). Its main purpose is to reduce bias by the combination of trees. Each sub-model has a large bias but variance is very small.
2. For the parameters of the tree, the main RF modulation is max_features, which aims to increase the randomness between sub-models to reduce the overall model variance role. For GBDT, the primary purpose is to reduce the bias, so we pay more attention to parameters such as max_depth, and reduce the bias of the sub-model by adjusting the complexity of the sub-model to reduce the bias of the whole model.
3.GBDT if it has been going on, it will certainly be over-fitting, but the basic classifier is very weak, so the ability of GBDT to fit the anti-overfitting is very strong.
4. Without the best learning rate, the lower the learning rate, the better, as long as there are enough trees.
5. If the learning rate is very low and there are many trees, then the cost of the time will be very high.

Tuning strategy
Before tuning the strategy, initialize some values first:

1.max_depth: Generally choose the small point, avoid the classifier is too strong, 5-8 can.
2.min_samples_split: Select 1% of the amount of data according to the data size selection.
3.min_samples_leaf: Based on data and intuitive selection.
4.max_features: General selection of sqrt.
S5.ubsample: General election 0.8.

Each tuning method is a grid search, the criteria for evaluating the quality of the parameters are CV score.

1. Tune the boosting parameters first, including Learning_rate and n_estimators. Generally, the learning_rate value is better in the 0.05-2,n_estimators in 30-80. If N_estimators is too big then increase learning_rate, conversely reduce learning_rate.
2. The parameters of the tree are tuned around the initialization, and the order of the tuning is Max_depth, Min_samples_split, Min_samples_leaf, Max_features, and the visual computing ability is optimal or the combination parameter is tuned.
3. Determine the optimal combination of parameters, and then reduce the learning_rate, increasing the n_estimators of the same multiples, until the computational power reaches the limit or the model on the validation set is very small.

Original link: http://blog.younggy.com/2017/02/24/machine learning algorithm tuning/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More