Machine learning how to do the Tuning/learning Machine

Source: Internet
Author: User

It's really a job to get me off the floor.

Adjusted for a half-day, training day, the results of what is not the time is really desperate ...

This article summed up his own thinking about the adjustment and some common parameters of adjustments, I hope to help.

If there are some statements or understandings in this article, you are welcome to criticize the great God.

Before we actually adjust the parameters, let's figure out two things:

1. What is the purpose of the argument?
2. What is the specific thing of the modulation.

First question:
The ultimate goal of the tuning is to make the model after the training more accurate, a step closer to the program is to make the loss function (for example, the loss in SSDs) as small as possible (since the model quality trained by the training set can only be detected by a validation set during training).
Therefore, the parameter can be considered as a multivariate function optimization problem.

Second question:
Before we answer the second question, we'll introduce a concept, a super parameter.
"Super parameter"
The parameters of the values are artificially set before the model begins the learning process, rather than by training the parameter data (such as B, W) in the normal sense.
These parameters define the concept of a higher level of the model (model complexity, learning capability, etc.).
You cannot learn directly from the data in the Standard Model training process, you need to define it in advance.
You can decide by setting different values, training different models, and choosing better test values.

Number or depth of the tree, learning rate, hidden layer of deep neural network, number of clusters in K-means clustering ...

So now, maybe everyone has a little bit of the heart, we want to tune the parameters, mainly this "super parameter."

"Adjusting parameters commonly used in deep learning"
1. Learning Rate (learning rate)
The adjustment of learning rate should be a very common operation. Generally with the increase in the number of iterations, when the loss is not going to suspend the training model, and then the learning rate adjusted to the original 1/10 before continuing the training.
The reason is that the gradient in the process of learning rate can be seen as the length of the descent process, assuming that your step is very big can cross the valley directly on the opposite side of the mountain, it is difficult to get the local optimal solution. At this point, reducing the step size will increase your chances of going to the ground.

2. About the cross fitting
By using the methods of drop out, batch normalization and data argument, the generalization ability of the model can be adjusted by adjusting the probability of each neuron being discarded in drop out.

3. Number of network layers
In general, the more network layers, the better the performance (including sensitivity, convergence, etc.) of the model (which is one of the reasons why deep neural networks have been so hot lately). Correspondingly, the demand for computational power will be higher.

Moreover, the more the number of layers, the more the number of neurons, the higher the probability of the fitting.

What about this.

Use the various methods in 2 to prevent fitting.


Batch_size can be appropriately increased, but this method may have an impact on the performance of the computer (when the batch_size is tuned to the original twice times on this machine: about 24 to 48, the machine is already jammed ...). )。

Batch_size increase to a certain time, can achieve the best in the Times;
Batch_size increase to some time, can achieve the final convergence accuracy of the optimal.

Reference: The adjusting parameters commonly used in depth learning
Reference: What skills are there in depth learning and tuning?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.