Caffe's Solver file configuration
Explain the meaning of a parameter in the Solver.prototxt file.
In the task of DL, the analytic solution can hardly be found, so it is transformed into the optimization problem in mathematics. The main function of Sovler is to alternately call forward conduction and reverse conduction (forward & backward) to update the connection weights of the neural network so as to minimize loss, which is actually the parameter of the iterative optimization algorithm.
The Caffe Solver class provides 6 optimization algorithms that can be set by the Type keyword in the configuration file: Stochastic gradient descent (type: "SGD") Adadelta (Type: "Adadelta") Adaptive gradient (Type: "Adagrad") Adam (Type: "Adam") Nesterov ' accelerated gradient (type: "Nesterov") Rmsprop (Type: " Rmsprop ")
Simply put, Solver is a configuration file that tells Caffe how you need the network to be trained. Solver.prototxt Process
1. First design the objects that need to be optimized, as well as the Prototxt files (usually Train.prototxt and test.prototxt files) of the Training Network and test network for learning.
2. Update parameter 3 by optimizing forward and backward iterations
. Periodic evaluation of the network
4. Show model and Solver status during optimization
Solver Parameters
BASE_LR
This parameter represents the first learning rate of this network (beginning Learning rate), is generally a floating-point number, according to the knowledge of machine learning, LR led to not converge, too small will lead to convergence too slow, so this parameter setting is also very important.
Lr_policy
This parameter represents the change rule that learning rate should follow, which corresponds to a string, and the options and instructions are as follows: "Step"-you need to set a stepsize parameter to return BASE_LR * Gamma ^ (floor Size), ITER is the current iteration number "multistep"-similar to step, but requires stepvalue parameters, step is uniform interval change, and Multistep is based on stepvalue value of the change "fixed"-keep base_ LR invariant "exp"-Returns BASE_LR * Gamma ^ iter, iter for the current iteration number "poly"-Learning rate for polynomial error attenuation, return Base_lr (1-iter/max_iter) ^ (power) "s Igmoid "-Learning rate for sigmod function attenuation, return BASE_LR (1/1+exp (-gamma * (iter-stepsize)))
Gamma
This parameter is related to the learning rate, the Lr_policy contains this parameter, the need to set, is generally a real number.
stepsize
This is parameter indicates how often (at some iteration count), we should move onto the next "step" of training. This value is a positive integer.
Stepvalue
This parameter indicates one of potentially many iteration counts, we should move onto the next ' step ' of training. This value is a positive integer. There are often more than one of these parameters present, each one indicated the next step iteration.
Max_iter
The maximum number of iterations, this number tells the network when to stop training, too small will not reach convergence, too general cause oscillation, is a positive integer.
Momentum
The weight of the previous gradient update, real fraction
Weight_decay
Weight attenuation term used to prevent fitting.
Solver_mode
Choose CPU training or GPU training.
Snapshot
Training snapshots to determine how often to save the model and solverstate,positive integer.
Snapshot_prefix
The prefix of the snapshot is the named prefix of model and solverstate, and also represents the path.
Net
Path to Prototxt (train and Val)
Test_iter
The number of iterations per test_interval, assuming that the total number of test samples is 10000 pictures, the one-time execution of all words is inefficient, so the test data into several batches for testing, the number of each batch is batch_size. If batch_size=100, then you need to iterate 100 times to complete all 10,000 of the data, so Test_iter is set to 100.
Test_interval
Test interval, and test the number of times per training.
Display
How often the results are output
iter_size
This parameter multiplied by the batch size in the Train.prototxt is the batch size you actually use. Equivalent to read batchsize * itersize image before doing a gradient decent. This parameter avoids the batchsize limit due to insufficient GPU memory because you can use multiple iteration to do a great batch even if the single batch is limited.
Average_loss
Take the loss of multiple foward to make the average, display output Basic
NET: "Examples/mnist/lenet_train_test.prototxt"
test_iter:100
test_interval:500
base_lr:0.01
momentum:0.9
weight_decay:0.0005
lr_policy: "INV"
gamma:0.0001
power:0.75
max_iter:10000
snapshot:5000
snapshot_prefix: "examples/mnist/model/"
solver_mode:cpu
Adam
NET: "Examples/mnist/lenet_train_test.prototxt"
test_iter:100
test_interval:500
base_lr:0.001
momentum:0.9
momentum2:0.999
lr_policy: "Fixed"
display:100
max_iter:10000
snapshot:5000
snapshot_prefix: "Examples/mnist/lenet"
type: "Adam"
solver_mode:cpu
Adagrad
NET: "Examples/mnist/mnist_autoencoder.prototxt"
test_state: {stage: ' Test-on-train '}
test_iter:500
test_state: {stage: ' Test-on-test '}
test_iter:100
test_interval:500
test_compute_loss:true< c20/>base_lr:0.01
lr_policy: "Fixed"
display:100
max_iter:65000
weight_decay:0.0005
snapshot:10000
snapshot_prefix: "Examples/mnist/mnist_autoencoder_adagrad_train"
Solver_mode:gpu
type: "Adagrad"
Rmsprop
NET: "Examples/mnist/lenet_train_test.prototxt"
test_iter:100
test_interval:500
base_lr:0.01
momentum:0.0
weight_decay:0.0005
lr_policy: "INV"
gamma:0.0001
power:0.75 max_iter:10000
snapshot:5000
snapshot_prefix: "Examples/mnist/lenet_rmsprop"
Solver_mode:gpu
type: "Rmsprop"
rms_decay:0.98
Adadelta
NET: "Examples/mnist/lenet_train_test.prototxt"
test_iter:100
lr_policy: "Fixed"
momentum:0.95
weight_decay:0.0005
display:100
max_iter:10000
snapshot:5000
snapshot_prefix: " Examples/mnist/lenet_adadelta "
solver_mode:gpu
Type:" Adadelta "
delta:1e-6
Nesterov
NET: "Examples/mnist/mnist_autoencoder.prototxt"
test_state: {stage: ' Test-on-train '}
test_iter:500
Test_state: {stage: ' Test-on-test '}
test_iter:100
test_interval:500
test_compute_loss:true
lr:0.01
lr_policy: "Step"
gamma:0.1
stepsize:10000
display:100
max_iter:65000 _decay:0.0005
snapshot:10000
snapshot_prefix: "Examples/mnist/mnist_autoencoder_nesterov_train"
momentum:0.95
solver_mode:gpu
type: "Nesterov"
FCN's Solver.prototxt
Train_net: "Train.prototxt"
test_net: "Val.prototxt"
test_iter:736
# make test net, but don ' t invoke it f Rom the solver itself
test_interval:999999999
display:20
average_loss:20
lr_policy: "Fixed"
# LR for unnormalized softmax
base_lr:1e-10
# High Momentum
momentum:0.99
# no gradient accum Ulation
iter_size:1
max_iter:100000
weight_decay:0.0005
snapshot:4000
snapshot_prefix : "Snapshot/train"
test_initialization:false
Reference
http://blog.csdn.net/czp0322/article/details/52161759