Solver is the core of Caffe, which coordinates the operation of the whole model. One of the parameters required to run the Caffe program is the Solver configuration file. Running code is typically
#caffe Train--solver=*_solver.prototxt
In deep learning, it is often loss function is non-convex, there is no analytic solution, we need to solve by the optimization method. The main function of Solver is to alternately invoke the forward (forward) algorithm and the back (backward) algorithm to update the parameters so as to minimize loss, which is actually an iterative optimization algorithm.
To the current version, Caffe provides six optimization algorithms to solve the optimal parameters, in the Solver configuration file, by setting type types to select.
- Stochastic Gradient descent (
type: "SGD"
),
- Adadelta (
type: "AdaDelta"
),
- Adaptive Gradient (
type: "AdaGrad"
),
- Adam (
type: "Adam"
),
- Nesterov ' s accelerated Gradient (
type: "Nesterov"
) and
- Rmsprop (
type: "RMSProp"
)
For an introduction to each of these methods, see the next article in this series, which focuses on writing Solver configuration files.
the Solver process :
1. Design the object to be optimized, as well as the training network for learning and the test network for evaluation. (by calling another profile, Prototxt)
2. Update the parameters by optimizing forward and backward iterations.
3, regular evaluation of the test network. (You can set a test after how many times you have trained.)
4, in the optimization process to show the model and Solver state.
During each iteration, Solver did the following steps:
1, call the forward algorithm to calculate the final output value, and the corresponding loss
2, call the backward algorithm to calculate the gradient of each layer
3, according to the selected Solver method, using the gradient to update the parameters
4. Record and save the learning rate, snapshot, and corresponding status of each iteration.
Next, let's take a look at the example:
1Net"Examples/mnist/lenet_train_test.prototxt" 2test_iter:1003test_interval:5004base_lr:0.015momentum:0.96 TYPE:SGD7weight_decay:0.00058Lr_policy:"INV" 9gamma:0.0001Tenpower:0.75 Onedisplay:100 Amax_iter:20000 -snapshot:5000 -Snapshot_prefix:"examples/mnist/lenet" theSolver_mode:cpu
Next, we explain each line in detail:
1 Net: "examples/mnist/lenet_train_test.prototxt"
Sets the depth network model. Each model is a net that needs to be configured in a dedicated configuration file, with each net composed of a number of layer layers. The specific configuration of each layer can refer to the previous blog.
Note: The path to the file starts at the root of the Caffe, and all other configurations are the same.
The training model and the test model can also be set separately using Train_net and test_net. For example:
1 " Examples/hdf5_classification/logreg_auto_train.prototxt " 2 " Examples/hdf5_classification/logreg_auto_test.prototxt "
Then the second line:
1 test_iter:100
This is to be understood in conjunction with the batch_size in the test layer. The total number of test samples in the Mnist data is 10000, and it is inefficient to perform all the data at once, so we divide the test data into batches to execute, and the number of each batch is batch_size. Assuming that we set the batch_size to 1000, we need to iterate 100 times to complete all 10,000 of the data. So Test_iter is set to 100. Once the data is executed, it is called an epoch
1 test_interval:500
Test interval. That is, each training 500 times, only one test.
1 base_lr:0.01 2"inv" 3 gamma:0.0001 4 power:0.75
These four lines can be put together to understand the settings for the learning rate. As long as the gradient descent method to solve the optimization, there will be a learning rate, also known as the step size. BASE_LR is used to set the basic learning rate, which can be adjusted during the iterative process for the underlying learning rate. How to adjust, is to adjust the strategy, set by Lr_policy.
Lr_policy can be set to the following values, the corresponding learning rate is calculated as:
- - fixed: Keep BASE_LR unchanged.
- - Step: If set to step, you also need to set a stepsize, return BASE_LR * Gamma ^ (floor (iter/stepsize)), where ITER represents the current number of iterations
- - exp: Return BASE_LR * Gamma ^ iter, ITER is the current number of iterations
- - INV: If set to Inv, also need to set a power, return BASE_LR * (1 + gamma * iter) ^ (-Power)
- - multistep: If set to Multistep, you also need to set a stepvalue. This parameter is similar to step, step is uniform interval change, and Multistep is based on stepvalue value change
- - Poly: Learning rate polynomial error, return BASE_LR (1-iter/max_iter) ^ (power)
- - sigmoid: Learning rate is sigmod attenuated, return to BASE_LR (1/(1 + exp (-gamma * (iter-stepsize))))
multistep Example:
1 base_lr:0.01 2 momentum:0.9 3 weight_decay:0.0005 4# c8> The learning rate policy 5"multistep" 6 gamma:0.9 7 stepvalue:5000 8 stepvalue:7000 9 stepvalue:8000 Ten stepvalue:9000 One
The following parameters:
1
The weight of the previous gradient update, specific to the next blog post.
1 TYPE:SGD
Optimization algorithm selection. This line can be omitted because the default value is SGD. There are six ways to choose from, as described at the beginning of this article.
1
A weight decay term that prevents an over-fitting parameter.
1
Each training is 100 times, shown on the screen, and not displayed if set to 0.
1 max_iter:20000
The maximum number of iterations. This number setting is too small, resulting in no convergence and low accuracy. Setting too large can cause shocks and waste time.
1 snapshot:5000 2"examples/mnist/lenet"
Snapshot. The trained model and Solver state are saved,snapshot used to set the training number of times to save, the default is 0, do not save. Snapshot_prefix Set the save path. You can also set Snapshot_diff, save gradient values, default to False, and do not save. You can also set the Snapshot_format, saved type. There are two options: HDF5 and Binaryproto, which are binaryproto by default.
1 solver_mode:cpu
Set the run mode. The default is the GPU, if you do not have a GPU, you need to change to the CPU, or you will get an error.
Note: All of the above parameters are optional and have default values. Depending on the Solver method (type), there are some other parameters that are not listed here .
"Turn" Caffe preliminary Examination (ix) Solver and its setting