Caffe Solver File Configuration interpretation __solver

Source: Internet
Author: User
Caffe's Solver file configuration

Explain the meaning of a parameter in the Solver.prototxt file.

In the task of DL, the analytic solution can hardly be found, so it is transformed into the optimization problem in mathematics. The main function of Sovler is to alternately call forward conduction and reverse conduction (forward & backward) to update the connection weights of the neural network so as to minimize loss, which is actually the parameter of the iterative optimization algorithm.

The Caffe Solver class provides 6 optimization algorithms that can be set by the Type keyword in the configuration file: Stochastic gradient descent (type: "SGD") Adadelta (Type: "Adadelta") Adaptive gradient (Type: "Adagrad") Adam (Type: "Adam") Nesterov ' accelerated gradient (type: "Nesterov") Rmsprop (Type: " Rmsprop ")
Simply put, Solver is a configuration file that tells Caffe how you need the network to be trained. Solver.prototxt Process

1. First design the objects that need to be optimized, as well as the Prototxt files (usually Train.prototxt and test.prototxt files) of the Training Network and test network for learning.
2. Update parameter 3 by optimizing forward and backward iterations
. Periodic evaluation of the network
4. Show model and Solver status during optimization
Solver Parameters

BASE_LR
This parameter represents the first learning rate of this network (beginning Learning rate), is generally a floating-point number, according to the knowledge of machine learning, LR led to not converge, too small will lead to convergence too slow, so this parameter setting is also very important.

Lr_policy

This parameter represents the change rule that learning rate should follow, which corresponds to a string, and the options and instructions are as follows: "Step"-you need to set a stepsize parameter to return BASE_LR * Gamma ^ (floor Size), ITER is the current iteration number "multistep"-similar to step, but requires stepvalue parameters, step is uniform interval change, and Multistep is based on stepvalue value of the change "fixed"-keep base_ LR invariant "exp"-Returns BASE_LR * Gamma ^ iter, iter for the current iteration number "poly"-Learning rate for polynomial error attenuation, return Base_lr (1-iter/max_iter) ^ (power) "s Igmoid "-Learning rate for sigmod function attenuation, return BASE_LR (1/1+exp (-gamma * (iter-stepsize)))

Gamma
This parameter is related to the learning rate, the Lr_policy contains this parameter, the need to set, is generally a real number.

stepsize

This is parameter indicates how often (at some iteration count), we should move onto the next "step" of training. This value is a positive integer.

Stepvalue
This parameter indicates one of potentially many iteration counts, we should move onto the next ' step ' of training. This value is a positive integer. There are often more than one of these parameters present, each one indicated the next step iteration.

Max_iter
The maximum number of iterations, this number tells the network when to stop training, too small will not reach convergence, too general cause oscillation, is a positive integer.

Momentum
The weight of the previous gradient update, real fraction

Weight_decay
Weight attenuation term used to prevent fitting.

Solver_mode
Choose CPU training or GPU training.

Snapshot
Training snapshots to determine how often to save the model and solverstate,positive integer.

Snapshot_prefix
The prefix of the snapshot is the named prefix of model and solverstate, and also represents the path.

Net
Path to Prototxt (train and Val)

Test_iter
The number of iterations per test_interval, assuming that the total number of test samples is 10000 pictures, the one-time execution of all words is inefficient, so the test data into several batches for testing, the number of each batch is batch_size. If batch_size=100, then you need to iterate 100 times to complete all 10,000 of the data, so Test_iter is set to 100.

Test_interval
Test interval, and test the number of times per training.

Display
How often the results are output

iter_size
This parameter multiplied by the batch size in the Train.prototxt is the batch size you actually use. Equivalent to read batchsize * itersize image before doing a gradient decent. This parameter avoids the batchsize limit due to insufficient GPU memory because you can use multiple iteration to do a great batch even if the single batch is limited.

Average_loss
Take the loss of multiple foward to make the average, display output Basic

NET: "Examples/mnist/lenet_train_test.prototxt"
test_iter:100
test_interval:500
base_lr:0.01
momentum:0.9
weight_decay:0.0005
lr_policy: "INV"
gamma:0.0001
power:0.75
max_iter:10000
snapshot:5000
snapshot_prefix: "examples/mnist/model/"
solver_mode:cpu
Adam
NET: "Examples/mnist/lenet_train_test.prototxt"
test_iter:100
test_interval:500
base_lr:0.001
momentum:0.9
momentum2:0.999
lr_policy: "Fixed"
display:100
max_iter:10000
snapshot:5000
snapshot_prefix: "Examples/mnist/lenet"
type: "Adam"
solver_mode:cpu
Adagrad
NET: "Examples/mnist/mnist_autoencoder.prototxt"
test_state: {stage: ' Test-on-train '}
test_iter:500
test_state: {stage: ' Test-on-test '}
test_iter:100
test_interval:500
test_compute_loss:true< c20/>base_lr:0.01
lr_policy: "Fixed"
display:100
max_iter:65000
weight_decay:0.0005
snapshot:10000
snapshot_prefix: "Examples/mnist/mnist_autoencoder_adagrad_train"
Solver_mode:gpu
type: "Adagrad"
Rmsprop
NET: "Examples/mnist/lenet_train_test.prototxt"
test_iter:100
test_interval:500
base_lr:0.01
momentum:0.0
weight_decay:0.0005
lr_policy: "INV"
gamma:0.0001
power:0.75 max_iter:10000
snapshot:5000
snapshot_prefix: "Examples/mnist/lenet_rmsprop"
Solver_mode:gpu
type: "Rmsprop"
rms_decay:0.98
Adadelta
NET: "Examples/mnist/lenet_train_test.prototxt"
test_iter:100
lr_policy: "Fixed"
momentum:0.95
weight_decay:0.0005
display:100
max_iter:10000
snapshot:5000
snapshot_prefix: " Examples/mnist/lenet_adadelta "
solver_mode:gpu
Type:" Adadelta "
delta:1e-6
Nesterov
NET: "Examples/mnist/mnist_autoencoder.prototxt"
test_state: {stage: ' Test-on-train '}
test_iter:500
Test_state: {stage: ' Test-on-test '}
test_iter:100
test_interval:500
test_compute_loss:true
lr:0.01
lr_policy: "Step"
gamma:0.1
stepsize:10000
display:100
max_iter:65000 _decay:0.0005
snapshot:10000
snapshot_prefix: "Examples/mnist/mnist_autoencoder_nesterov_train"
momentum:0.95
solver_mode:gpu
type: "Nesterov"
FCN's Solver.prototxt
Train_net: "Train.prototxt"
test_net: "Val.prototxt"
test_iter:736
# make test net, but don ' t invoke it f Rom the solver itself
test_interval:999999999
display:20
average_loss:20
lr_policy: "Fixed"
# LR for unnormalized softmax
base_lr:1e-10
# High Momentum
momentum:0.99
# no gradient accum Ulation
iter_size:1
max_iter:100000
weight_decay:0.0005
snapshot:4000
snapshot_prefix : "Snapshot/train"
test_initialization:false

Reference
http://blog.csdn.net/czp0322/article/details/52161759

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.