Today, in training the network to change a learning strategy to try, so re-study the various learning strategies offered in the Caffe, here and you talk about my use of some of the lessons.
Let's take a look at the parameters related to the learning rate strategy, and the following are from Caffe.proto:
The learning rate decay policy.
The currently implemented learning rate//policies is as follows://-Fixed:always return BASE_LR. -Step:return BASE_LR * Gamma ^ (floor (Iter/step))//-Exp:return BASE_LR * Gamma ^ iter//-Inv:re Turn BASE_LR * (1 + gamma * iter) ^ (-power)//-Multistep:similar to step but it allows non uniform steps defined by//Stepvalue//-poly:the effective learning rate follows a polynomial decay, to is//zero by th E Max_iter. Return Base_lr (1-iter/max_iter) ^ (power)//-sigmoid:the effective learning rate follows a sigmod decay// return BASE_LR (1/(1 + exp (-gamma * (iter-stepsize)))////Where BASE_LR, Max_iter, Gamma, step, Stepvalue and
Power was defined//in the Solver parameter protocol buffer, and ITER was the current iteration.
Optional String lr_policy = 8; Optional float gamma = 9;
The parameter to compute the learning rate. Optional float power= 10; The parameter to compute the learning rate.
Optional float BASE_LR = 5; The Base learning rate
The stepsize for learning rate policy "step"
optional int32 stepsize =;
The stepsize for learning rate policy "multistep"
repeated int32 stepvalue = 34;
In fact, the above has been written very clearly, especially the study rate is how to calculate also has given a detailed explanation. I want to share some of my experiences with you.
As far as I can see, the most used learning strategies are fixed,step,inv,multistep. described below.
1. Fixed fixing strategy
Therefore, the name Incredibles is the learning rate is always a fixed value, if you are a novice or you are preparing to train a new network, I recommend you use this strategy. Because now you know nothing about the distribution of the data and the network parameters, the fixed strategy is easy to adjust, we can adjust our learning rate at any time according to the training conditions including the values of loss and accuracy. For example, I like to start by setting the learning rate to a larger value (between 0.1-0.01), but not to make loss explode. Then, by observing the values of loss and accuracy, such as the slow or not falling of the two, the learning rate is reduced by the occurrence of concussion. That is, a gradual reduction of the strategy.
2. Step Uniform Step Strategy
This strategy is used in conjunction with the parameter stepsize, when the number of loops reaches an integer multiple of stepsize lr=base_lr*gamma^ (Floor (iter/stepsize)). Here's a mistake I've made recently using this strategy. I started training my network from 250,000 times, I set BASE_LR to 0.001,stepsize set to 100000,gamma set to 0.1, I meant to let the training in 300,000 times when the learning rate dropped to 0.0001. It turns out. I believe you already know the answer. The learning rate has become 1e-6. Therefore, it is a special reminder that if you change your learning strategy to step in the middle of training, carefully calculate the value of stepsize.
3, multistep Multi-step or uneven step
This learning strategy is also something we have seen recently, and this learning strategy is very similar to the step strategy. This learning strategy needs to be used in conjunction with parameter Stepvalue, Stepvalue can be set in a file, such as stepvalue=10000,stepvalue=20000, ..., when the number of iterations reaches the value of the stepvalue that we specify sequentially , the learning rate will be recalculated based on the formula. This learning rate strategy although I try not much, but I found that he has a good use, that is, we start training network when the learning rate is generally set higher, so loss and accuracy fall quickly, generally the first 200,000 times the two fall faster, the latter may require us to use a smaller learning rate. The step strategy is too average, and the descent rate of loss and accuracy is an uneven process throughout the training process, so it is sometimes not appropriate. Fixed manual adjustment is also very troublesome, then multistep may come in handy.
4, INV (English is too poor do not know which Word is the abbreviation)
In fact, from the formula we can see that the advantage of this learning strategy is that it makes the learning rate in each iteration is reduced, but each decrease is a very small number, so that the trouble of manual adjustment of their own. This strategy is also widely used.
The above four learning strategies are the strategies that I have seen that we often use. Originally wanted to put some typical parameters also to everyone posted out, think or forget, this left everyone. It is helpful for us to give you a suggestion to see how the Solve.prototxt in the Caffe are choosing the parameters of the great gods.