Solver is the core of Caffe, which coordinates the operation of the entire model. One of the parameters that the Caffe program runs must be the Solver configuration file. Running code is typically
# Caffe Train--solver=*_slover.prototxt
In deep learning, loss function is not convex, there is no analytic solution, we need to solve it by optimization method. The main function of Solver is to update parameters by alternately calling forward (forward) algorithm and back (backward) algorithm, thus minimizing loss, which is actually an iterative optimization algorithm.
To the current version, Caffe provides six optimization algorithms to solve the optimal parameters, in the Solver configuration file, by setting type types to select. Stochastic gradient descent (type: "SGD"), Adadelta (type: "Adadelta"), Adaptive gradient (type: "Adagrad"), Adam (type: " Adam "), Nesterov ' s accelerated gradient (type:" Nesterov ") and Rmsprop (type:" Rmsprop ")
For an introduction to each of these specific methods, see the next article in this series, which focuses on writing the Solver configuration file.
The Solver process:
1. Design the objects that need to be optimized, as well as the training network for learning and the test network for evaluation. (Prototxt by calling another configuration file.)
2. Forward and backward iterative optimization to follow the new parameters.
3. Regular evaluation and testing network. (You can set the number of times after the training, a test)
4. Display the model and Solver status during the optimization process
In each iteration, Solver did the following steps:
1, call the forward algorithm to calculate the final output value, and the corresponding loss
2, call the backward algorithm to calculate the gradient of each layer
3, according to the choice of Slover method, using gradient for parameter update
4, record and save each iteration of the learning rate, snapshots, and the corresponding state.
Next, let's take a look at an example:
[CPP] view plain copy print? NET: "Examples/mnist/lenet_train_test.prototxt" test_iter:100 test_interval:500 base_lr:0.01 momentum:0.9 ty PE:SGD weight_decay:0.0005 lr_policy: "INV" gamma:0.0001 power:0.75 display:100 max_iter:20000-Snapsho t:5000 snapshot_prefix: "Examples/mnist/lenet" solver_mode:cpu
NET: "Examples/mnist/lenet_train_test.prototxt"
test_iter:100
test_interval:500
base_lr:0.01
momentum:0.9
type:sgd
weight_decay:0.0005
lr_policy: "INV"
gamma:0.0001 power:0.75 display:100
max_iter:20000
snapshot:5000
snapshot_prefix: "Examples/mnist/lenet"
solver_ Mode:cpu
Next, we interpret each line in detail:
[CPP] view plain copy print? NET: "Examples/mnist/lenet_train_test.prototxt"
NET: "Examples/mnist/lenet_train_test.prototxt"
Set the depth network model. Each model is a net, which needs to be configured in a dedicated configuration file, each of which consists of many layer. The specific configuration of each layer can be referred to in the article in this series (2)-(5). Note that the path to the file starts at the root directory of the Caffe, and all other configurations are the same.
Train_net and test_net can also be used to set the training model and test model separately. For example: [CPP] view plain copy print? Train_net: "Examples/hdf5_classification/logreg_auto_train.prototxt" test_net: "examples/hdf5_classification/ Logreg_auto_test.prototxt "
Train_net: "Examples/hdf5_classification/logreg_auto_train.prototxt"
test_net: "Examples/hdf5_classification /logreg_auto_test.prototxt "
Then the second line:
[CPP]View Plain copy print? test_iter:100
test_iter:100
This is to be understood in conjunction with the batch_size in test layer. The total number of test samples in mnist data is 10000, the one-time execution of all data is inefficient, so we divide the test data into batches to execute, and the number of each batch is batch_size. Suppose we set Batch_size to 100, we need to iterate 100 times to complete all 10,000 of the data. So the test_iter is set to 100. Finish all the data once, call it a epoch
[CPP]View Plain copy print? test_interval:500
test_interval:500
The test interval. That is, 500 times per training, only one test.
[CPP]View Plain copy print? base_lr:0.01 lr_policy: "INV" gamma:0.0001 power:0.75
base_lr:0.01
lr_policy: "INV"
gamma:0.0001
power:0.75
These four lines can be put together to understand the settings for the learning rate. As long as the gradient descent method to solve the optimization, there will be a learning rate, also called step. BASE_LR is used to set the basic learning rate, and in the iterative process, the basic learning rate can be adjusted. How to adjust, is to adjust the strategy, by Lr_policy to set.
Lr_policy can be set to the following values, the corresponding learning rate is calculated as:-Fixed: Keep base_lr unchanged. -Step: If set to step, you also need to set a stepsize, return BASE_LR * Gamma ^ (floor (iter/stepsize)), where ITER represents the current number of iterations-Exp: Back to BASE_LR * Gamma ^ iter, iter for the current iteration number-INV: If set to Inv, also need to set a power, return BASE_LR * (1 + gamma * iter) ^ (-Power)-multistep: If Set to Multistep, you also need to set a stepvalue. This parameter is similar to step, the step is uniform equal interval change, and Multistep is based on the Stepvalue value change-poly: Learning rate of polynomial error, return BASE_LR ( 1-iter/max_iter) ^ (Power)-sigmoid: Learning rate sigmod Attenuation, return BASE_LR (1/(1 + exp (-gamma * (iter-stepsize)))
multistep Example:
[CPP] view plain copy print? base_lr:0.01 momentum:0.9 weight_decay:0.0005 # The Learning rate policy: Lr_policy multistep stepvalue:5000 stepvalue:7000 stepvalue:8000 stepvalue:9000 stepvalue:9500
base_lr:0.01
momentum:0.9
weight_decay:0.0005
# The Learning rate policy
-lr_policy: "Multistep"
gamma:0.9
stepvalue:5000
stepvalue:7000
stepvalue:8000
stepvalue:9000
stepvalue:9500
Next parameter: [CPP] view plain copy print? momentum:0.9
momentum:0.9
The weight of the previous gradient update can be specified in the next article.
[CPP]View Plain copy print? Type:sgd
Type:sgd
Optimization algorithm selection. This line can be omitted because the default value is SGD. There are six different ways to choose from, as described at the beginning of this article.
[CPP]View Plain copy print? weight_decay:0.0005
weight_decay:0.0005
Weight attenuation term, which prevents a parameter from fitting.
[CPP]View Plain copy print? display:100
display:100
100 times per training, displayed once on the screen. If set to 0, it is not displayed.
[CPP]View Plain copy print? max_iter:20000
max_iter:20000
The maximum number of iterations. This number setting is too small to cause convergence and low precision. Setting too large can cause turbulence and waste of time.
[CPP]View Plain copy print? snapshot:5000 snapshot_prefix: "Examples/mnist/lenet"
snapshot:5000
snapshot_prefix: "Examples/mnist/lenet"
Snapshot. will be trained to save the model and Solver State, snapshot used to set the training number of times to save, the default is 0, do not save. Snapshot_prefix Settings Save path.
You can also set Snapshot_diff, save gradient values, default to False, and do not save.
You can also set the Snapshot_format and save the type. There are two options: HDF5 and Binaryproto, default to Binaryproto [CPP] view plain copy print? Solver_mode:cpu
Solver_mode:cpu
Set the run mode. The default is GPU, if you do not have a GPU, you need to change to the CPU, or you will be wrong.
Note: All of the above parameters are optional and have default values. Depending on the Solver method (type), there are some other parameters that are not listed here.
The above part is transferred from: http://www.cnblogs.com/denny402/p/5083300.html