Solver and detailed configuration parameters

Source: Internet
Author: User

Original URL:

Http://www.cnblogs.com/denny402/p/5074049.html

Solver is the core of Caffe, which coordinates the operation of the whole model. One of the parameters required to run the Caffe program is the Solver configuration file. Running code is typically

# Caffe Train--solver=*_slover.prototxt

In deep learning, it is often loss function is non-convex, there is no analytic solution, we need to solve by the optimization method. The main function of Solver is to alternately invoke the forward (forward) algorithm and the back (backward) algorithm to update the parameters so as to minimize loss, which is actually an iterative optimization algorithm.

To the current version, Caffe provides six optimization algorithms to solve the optimal parameters, in the Solver configuration file, by setting type types to select. Stochastic Gradient descent (type: "SGD"), Adadelta (type: "Adadelta"), Adaptive Gradient (type: "Adagrad"), Adam (type: " Adam "), Nesterov ' s Accelerated Gradient (type:" Nesterov ") and Rmsprop (type:" Rmsprop ")

For an introduction to each of these methods, see the next article in this series, which focuses on writing Solver configuration files.

The Solver process:

1. Design the objects that need to be optimized, as well as the training network for learning and the test network for evaluation. (by calling another profile, Prototxt)

2. Forward and backward iterations are optimized to follow the new parameters.

3. Regular evaluation of the test network. (You can set a test after how many times you have trained.)

4. Show the status of the model and solver during the optimization process

During each iteration, Solver did the following steps:

1, call the forward algorithm to calculate the final output value, and the corresponding loss

2, call the backward algorithm to calculate the gradient of each layer

3, according to the selected Slover method, using the gradient to update the parameters

4. Record and save the learning rate, snapshot, and corresponding status of each iteration.

Next, let's look at an example:

NET: "Examples/mnist/lenet_train_test.prototxt"
test_iter:100
test_interval:500
base_lr:0.01
momentum:0.9
type:sgd
weight_decay:0.0005
lr_policy: "INV"
gamma:0.0001
power:0.75
display:100
max_iter:20000
snapshot:5000
snapshot_prefix: "Examples/mnist/lenet"
solver_ Mode:cpu

Next, we interpret each line in detail:

NET: "Examples/mnist/lenet_train_test.prototxt"

Sets the depth network model. Each model is a net that needs to be configured in a dedicated configuration file, with each net composed of a number of layer layers. The specific configuration of each layer can refer to (2)-(5) in the article in this series. Note: The path to the file starts at the root of the Caffe, and all other configurations are the same.

The training model and the test model can also be set separately using Train_net and test_net. For example:

Train_net: "Examples/hdf5_classification/logreg_auto_train.prototxt"
test_net: "Examples/hdf5_classification /logreg_auto_test.prototxt "

Then the second line:

test_iter:100

This is to be understood in conjunction with the batch_size in the test layer. The total number of test samples in the Mnist data is 10000, and it is inefficient to perform all the data at once, so we divide the test data into batches to execute, and the number of each batch is batch_size. Assuming that we set the batch_size to 100, we need to iterate 100 times to complete all 10,000 of the data. So Test_iter is set to 100. Once the entire data is executed, called an epoch

test_interval:500

Test interval. That is, each training 500 times, only one test.

base_lr:0.01
Lr_policy: "INV"
gamma:0.0001
power:0.75

These four lines can be put together to understand the settings for the learning rate. As long as the gradient descent method to solve the optimization, there will be a learning rate, also known as the step size. BASE_LR is used to set the basic learning rate, which can be adjusted during the iterative process for the underlying learning rate. How to adjust, is to adjust the strategy, set by Lr_policy.

Lr_policy can be set to the following values, the corresponding learning rate is calculated as:-Fixed: Keep base_lr unchanged. -Step: If set to step, you also need to set a stepsize, return BASE_LR * Gamma ^ (floor (iter/stepsize)), where ITER represents the current number of iterations-Exp: Return BASE_LR * Gamma ^ iter, ITER is the current number of iterations-INV: If set to Inv, you also need to set a power, return BASE_LR * (1 + gamma * iter) ^ (-Power)-multistep: If Set to Multistep, you also need to set a stepvalue. This parameter is similar to step, step is uniform interval change, and Multistep is based on stepvalue value change-poly: Learning rate polynomial error, return BASE_LR ( 1-iter/max_iter) ^ (Power)-sigmoid: Learning rate is sigmod attenuated, return to BASE_LR (1/(1 + exp (-gamma * (iter-stepsize))))

multistep Example:

base_lr:0.01
momentum:0.9
weight_decay:0.0005
# The Learning rate policy
lr_policy: "Multistep"
gamma:0.9
stepvalue:5000
stepvalue:7000
stepvalue:8000
stepvalue:9000
stepvalue:9500

The following parameters:

momentum:0.9

The weight of the previous gradient update, as specific to the next article.

Type:sgd

Optimization algorithm selection. This line can be omitted because the default value is SGD. There are six ways to choose from, as described at the beginning of this article.

weight_decay:0.0005

A weight decay term that prevents an over-fitting parameter.

display:100

Each training is 100 times and is displayed on the screen once. If set to 0, it is not displayed.

max_iter:20000

The maximum number of iterations. This number setting is too small, resulting in no convergence and low accuracy. Setting too large can cause shocks and waste time.

snapshot:5000
snapshot_prefix: "Examples/mnist/lenet"

Snapshot. The trained model and Solver state are saved, snapshot used to set the training number of times to save, the default is 0, do not save. Snapshot_prefix set the save path.

You can also set Snapshot_diff, save gradient values, default to False, and do not save.

You can also set the Snapshot_format, saved type. There are two options: HDF5 and Binaryproto, default to Binaryproto

Solver_mode:cpu

Set the run mode. The default is the GPU, if you do not have a GPU, you need to change to the CPU, or you will get an error.

Note: All of the above parameters are optional and have default values. Depending on the Solver method (type), there are some other parameters that are not listed here.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.