Caffe:from Snapshop Resume Training

Source: Internet
Author: User

Execution method: Original address: https://yunmingzhang.wordpress.com/2015/02/04/ caffe-notes-using-snap-shot-in-convolutional-neural-network-training/

This was a post summarizing how to resume training on Caffe using snapshots.

First, you need to generate snapshot files. You can do this by specify in Solver.prototxt file. Of course, the name of the Solver file is different for different models, usually like Cifar10_quick_solver.prototxt

# Snapshot Intermediate Results

snapshot:500

This is means that it would take a snapshot every of iterations. Not that it would only take a snapshot at the 500th iteration.

Once you has the snapshots, you'll see both files, Model_iter_xxx.caffemodel and model_iter_xxx.solverstate (for example , Cifar10_quick_iter_3000.solverstate). The prefix of the filename can customized in the Prototxt file.

Once You has the snapshot, you can specify to use the snapshot in the training script, for CIFAR10, you can specify in th E train_quick.sh with the

Option–snapshot=cifar10_quick_iter_3000.solverstate.

This'll start the training at the 3000th iteration, a note can is found here Http://caffe.berkeleyvision.org/gathered/ex Amples/imagenet.html for Imagine.

Despite the fact that is specified the Cifar10_quick_iter_3000.solverstate file, to get it actually running, you ALS O need the Cifar10_quick_iter_3000.caffemodel file in the directory.

There is a TRICK here, the options snapshots and Solver has to be specified on the same line, which is don ' t miss the ' \ After the Solver option

$TOOLS/caffe train \

–solver=examples/cifar10/cifar10_quick_solver.prototxt \

–snapshot=examples/cifar10/cifar10_quick_iter_3000.solverstate

OTHERWISE, it won't start from the snapshot and it won ' t tell you what the problem is.

Here is my personal summary (verified in lenet training on mnist):

In the ways of resuming training above

1) Number of iterations: Trial at 1, the initial training to specify the number of training, in the middle of human interruption, to continue training, also suitable for 2, after the initial specified number of training, in the Solover to re-specify the number of training, continue training.

2) Learning Rate: If you do not change the initial training solver in the Learning mode and learning rate, the continuing training model of the learning rate will follow the recovery of the learning rate of the status of continuing training, if you change the learning mode and the value of the learning rate, the continuing training of the model learning rate will follow the change.

In short, when resuming training, the model will continue to be trained in all the previous states except for the number of iterations, and if other parameters are changed, the parameter will be reloaded into training solver.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.