International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Caffe Solver File Configuration interpretation __solver

Last Update:2018-08-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Caffe's Solver file configuration

Explain the meaning of a parameter in the Solver.prototxt file.

In the task of DL, the analytic solution can hardly be found, so it is transformed into the optimization problem in mathematics. The main function of Sovler is to alternately call forward conduction and reverse conduction (forward & backward) to update the connection weights of the neural network so as to minimize loss, which is actually the parameter of the iterative optimization algorithm.

The Caffe Solver class provides 6 optimization algorithms that can be set by the Type keyword in the configuration file: Stochastic gradient descent (type: "SGD") Adadelta (Type: "Adadelta") Adaptive gradient (Type: "Adagrad") Adam (Type: "Adam") Nesterov ' accelerated gradient (type: "Nesterov") Rmsprop (Type: " Rmsprop ")
Simply put, Solver is a configuration file that tells Caffe how you need the network to be trained. Solver.prototxt Process

1. First design the objects that need to be optimized, as well as the Prototxt files (usually Train.prototxt and test.prototxt files) of the Training Network and test network for learning.
2. Update parameter 3 by optimizing forward and backward iterations
. Periodic evaluation of the network
4. Show model and Solver status during optimization

Solver Parameters

BASE_LR
This parameter represents the first learning rate of this network (beginning Learning rate), is generally a floating-point number, according to the knowledge of machine learning, LR led to not converge, too small will lead to convergence too slow, so this parameter setting is also very important.

Lr_policy

This parameter represents the change rule that learning rate should follow, which corresponds to a string, and the options and instructions are as follows: "Step"-you need to set a stepsize parameter to return BASE_LR * Gamma ^ (floor Size), ITER is the current iteration number "multistep"-similar to step, but requires stepvalue parameters, step is uniform interval change, and Multistep is based on stepvalue value of the change "fixed"-keep base_ LR invariant "exp"-Returns BASE_LR * Gamma ^ iter, iter for the current iteration number "poly"-Learning rate for polynomial error attenuation, return Base_lr (1-iter/max_iter) ^ (power) "s Igmoid "-Learning rate for sigmod function attenuation, return BASE_LR (1/1+exp (-gamma * (iter-stepsize)))

Gamma
This parameter is related to the learning rate, the Lr_policy contains this parameter, the need to set, is generally a real number.

stepsize

This is parameter indicates how often (at some iteration count), we should move onto the next "step" of training. This value is a positive integer.

Stepvalue
This parameter indicates one of potentially many iteration counts, we should move onto the next ' step ' of training. This value is a positive integer. There are often more than one of these parameters present, each one indicated the next step iteration.

Max_iter
The maximum number of iterations, this number tells the network when to stop training, too small will not reach convergence, too general cause oscillation, is a positive integer.

Momentum
The weight of the previous gradient update, real fraction

Weight_decay
Weight attenuation term used to prevent fitting.

Solver_mode
Choose CPU training or GPU training.

Snapshot
Training snapshots to determine how often to save the model and solverstate,positive integer.

Snapshot_prefix
The prefix of the snapshot is the named prefix of model and solverstate, and also represents the path.

Net
Path to Prototxt (train and Val)

Test_iter
The number of iterations per test_interval, assuming that the total number of test samples is 10000 pictures, the one-time execution of all words is inefficient, so the test data into several batches for testing, the number of each batch is batch_size. If batch_size=100, then you need to iterate 100 times to complete all 10,000 of the data, so Test_iter is set to 100.

Test_interval
Test interval, and test the number of times per training.

Display
How often the results are output

iter_size
This parameter multiplied by the batch size in the Train.prototxt is the batch size you actually use. Equivalent to read batchsize * itersize image before doing a gradient decent. This parameter avoids the batchsize limit due to insufficient GPU memory because you can use multiple iteration to do a great batch even if the single batch is limited.

Average_loss
Take the loss of multiple foward to make the average, display output Basic

NET: "Examples/mnist/lenet_train_test.prototxt"
test_iter:100
test_interval:500
base_lr:0.01
momentum:0.9
weight_decay:0.0005
lr_policy: "INV"
gamma:0.0001
power:0.75
max_iter:10000
snapshot:5000
snapshot_prefix: "examples/mnist/model/"
solver_mode:cpu

Adam

NET: "Examples/mnist/lenet_train_test.prototxt"
test_iter:100
test_interval:500
base_lr:0.001
momentum:0.9
momentum2:0.999
lr_policy: "Fixed"
display:100
max_iter:10000
snapshot:5000
snapshot_prefix: "Examples/mnist/lenet"
type: "Adam"
solver_mode:cpu

Adagrad

NET: "Examples/mnist/mnist_autoencoder.prototxt"
test_state: {stage: ' Test-on-train '}
test_iter:500
test_state: {stage: ' Test-on-test '}
test_iter:100
test_interval:500
test_compute_loss:true< c20/>base_lr:0.01
lr_policy: "Fixed"
display:100
max_iter:65000
weight_decay:0.0005
snapshot:10000
snapshot_prefix: "Examples/mnist/mnist_autoencoder_adagrad_train"
Solver_mode:gpu
type: "Adagrad"

Rmsprop

NET: "Examples/mnist/lenet_train_test.prototxt"
test_iter:100
test_interval:500
base_lr:0.01
momentum:0.0
weight_decay:0.0005
lr_policy: "INV"
gamma:0.0001
power:0.75 max_iter:10000
snapshot:5000
snapshot_prefix: "Examples/mnist/lenet_rmsprop"
Solver_mode:gpu
type: "Rmsprop"
rms_decay:0.98

Adadelta

NET: "Examples/mnist/lenet_train_test.prototxt"
test_iter:100
lr_policy: "Fixed"
momentum:0.95
weight_decay:0.0005
display:100
max_iter:10000
snapshot:5000
snapshot_prefix: " Examples/mnist/lenet_adadelta "
solver_mode:gpu
Type:" Adadelta "
delta:1e-6

Nesterov

NET: "Examples/mnist/mnist_autoencoder.prototxt"
test_state: {stage: ' Test-on-train '}
test_iter:500
Test_state: {stage: ' Test-on-test '}
test_iter:100
test_interval:500
test_compute_loss:true
lr:0.01
lr_policy: "Step"
gamma:0.1
stepsize:10000
display:100
max_iter:65000 _decay:0.0005
snapshot:10000
snapshot_prefix: "Examples/mnist/mnist_autoencoder_nesterov_train"
momentum:0.95
solver_mode:gpu
type: "Nesterov"

FCN's Solver.prototxt

Train_net: "Train.prototxt"
test_net: "Val.prototxt"
test_iter:736
# make test net, but don ' t invoke it f Rom the solver itself
test_interval:999999999
display:20
average_loss:20
lr_policy: "Fixed"
# LR for unnormalized softmax
base_lr:1e-10
# High Momentum
momentum:0.99
# no gradient accum Ulation
iter_size:1
max_iter:100000
weight_decay:0.0005
snapshot:4000
snapshot_prefix : "Snapshot/train"
test_initialization:false

Reference
http://blog.csdn.net/czp0322/article/details/52161759

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Caffe Solver File Configuration interpretation __solver

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support