Deeplearing Learning notes-improving deep neural networks (third week-hyper-parametric debugging, regularization)

Source: Internet
Author: User
background:

Introduction of Hyper-parameter debugging and processing 1-super-parameter debugging

Compared with the earlier one, we can use the grid-like numerical division to do the numerical traversal to obtain the optimal parameters. However, in the field of deep learning, we generally try to use random methods to make parameters.

The grid-like parameters in the above figure can only be fixed within 5 values, which is unwise if we have not yet known which parameters are more important. At this point, if we take the random value of the right figure, in the case where the value is 25, we get 25 parameters 1 and 25 parameter 2. For example, one of the parameters is the learning rate Α\alpha, the other is Ε\epsilon, the left figure just tried 5 α\alpha, and the right image to try 25 α\alpha values, more can find the most appropriate α\alpha.
For many more hyper-parameters, the search space of the parameters is high latitude, and the same is the method of random value, thus improving the search efficiency.

Another way is to rough and refine the search method first. After the above random values, we will find that some areas of the value effect is better, then we in this area to refine the value, more dense value.
2-Select the appropriate range of hyper-parameters

The previously mentioned random value is not a random uniform value within the range of valid values, but a uniform value after selecting the appropriate ruler.
For the number of neurons in a layer in a neural network, we can do a uniform search within a certain range, such as 20~40, or for the number of layers of a neural network, we can even search within a certain range, such as 2~5. However, for some parameters it is not applicable.
For example, learning rate α\alpha, suppose we set its minimum value of 0.0001, its maximum value is 1, that is, the search range (0.0001,1). If there is a random value along this axis, there is actually a 90% probability that the value is between (0.1,1), and only 10% of the search resources are used between (0.0001, 0.1). It is more reasonable to search for super parameters with a logarithmic ruler . The points 0.0001,0.001,0.01,0.1 and 1 are set on the axis respectively, and then the points are evenly taken on the logarithmic axis.

Python implementations:

R=-4 * Np.random.rand () #此时r取值范围是 [ -4,0]
alpha=np.power (10,r) #即alpha =10^r, so the alpha value range is [10^-4,10^0]

If a value is taken between 10^a and 10^b, for the above example, at this time the A=LOG10 (0.0001) =−4,b=log10 (1) =0 a=\log_{10} (0.0001) =-4,b=\log_{10} (1) = 0. Then we can randomly and evenly fetch the value of r from [a, b] and get α=10r \alpha=10^r. We are going to take the value of the 10a 10^a and 10b 10^b interval to the random evenly taking r value between A and B on the logarithmic axis.

For the Β\beta value of the super-parameter used when calculating the exponential weighted average, we assume that the Β\beta is between [0.9,0.999], which can be transformed by 1−β1-\beta at this time, and the range of 1−β1-\beta in [0.001, 0.1] can be used in the above way, To the random, evenly taking R-value problem between [ -3,1]. The Β\beta value is removed by 1−β=10r 1-\beta=10^r.

When Β\beta approaches 1 o'clock, the sensitivity of the resulting results changes, even if the Β\beta changes very slightly. For example, if the Β\beta changes from 0.9000 to 0.9005, then there will be no change in the results, but if Β\beta is changed from 0.999 to 0.9995, it will have a huge impact on the algorithm. In terms of the average value of an exponential weighted average, the former is based on averages of about 10 values, while the latter is averaged from about 1000 values (relative to 0.999), to about 2000 values (relative to 0.9995). The formula on which it is based is 1/(1-β\beta). Therefore, when the Β\beta is close to 1, the resulting value changes very sensitively. Therefore, when the Β\beta is close to 1, it needs to be more densely valued. For 1−

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.