Most of the documents are machine-turned, my English has not been four levels, so make a look
Build Imagenet
This guide is designed to prepare you to train your own models based on your data. If you just want a imagenet training network, then note that because training requires a lot of electricity, we hate global warming, and we provide the model Zoo with caffenet models for training as described below.
Data preparation
The guide specifies all paths and assumes that all commands are executed from the root Caffe directory. (ie ~/caffe)
Through "ImageNet" we mean ILSVRC12 challenges here, but you can also easily train the entire ImageNet, just need more disk space, and a longer training time.
We assume that you have downloaded Imagenet training data and validation data, and that they are stored on your disk, such as:
/path/to/imagenet/train/n01440764/n01440764_10026.jpeg
/path/to/imagenet/val/ilsvrc2012_val_00000001.jpeg
You will first need to prepare some ancillary data for training. This data can be downloaded in the following ways: (Directly executed in the Caffe directory)
./data/ilsvrc12/get_ilsvrc_aux.sh
The training data and the validation data entry are described in the text train.txt and Val.txt lists all the files and their labels. Note that we use a label index that is different from the ILSVRC Devkit: We sort the synset names in ASCII order, and then mark them from 0 to 999. You can see the synset/name mapping in Synset_words.txt.
You may need to adjust the image to 256x256 in advance. By default, we do not encourage this, because in a clustered environment, the use of mapreduce parallel way to resize image size benefits. For example, Yang Qing uses his light weight mincepie bag. If you prefer things to be simpler, you can also use shell commands, such as:
For name in/path/to/imagenet/val/*. JPEG; Do
Convert-resize 256x256\! $name $name
Done
Look at examples/imagenet/create_imagenet.sh. Set the path of the training and test Data folder as needed, and set "RESIZE = true" To adjust all images to 256x256 if you do not resize the image in advance, now just use Examples/imagenet/create_ IMAGENET.SH Create Leveldbs. Please note that EXAMPLES/IMAGENET/ILSVRC12_TRAIN_LEVELDB and examples/imagenet/ilsvrc12_val_leveldb should not exist until such execution. It will be created by the script. Glog_logtostderr=1 just dumps more information for you to check and you can safely ignore it.
Calculate the average of an image
The model requires us to subtract the average image from each image, so we have to calculate the mean. Tools/compute_image_mean.cpp implements it-it is also a good example of how to manipulate multiple components, such as protocol buffers, Leveldbs and logging if you are unfamiliar with them. In any case, the average calculation can be executed directly with the following file:
./examples/imagenet/make_imagenet_mean.sh
This creates a data/ilsvrc12/imagenet_mean.binaryproto.
Model definition
We will describe the Krizhevsky,sutskever and Hinton in their Nips 2012 paper, the first method of reference implementation.
The network definition (models/bvlc_reference_caffenet/train_val.prototxt) follows Krizhevsky et al. Note that if you deviate from the file path recommended in this guide, you need to adjust the associated path in the. prototxt file.
If you look closely at Models/bvlc_reference_caffenet/train_val.prototxt, you will notice several include sections specifying Phase:train or Phase:test. These sections allow us to define two closely related networks in one file: The network for training and the network for testing. The two networks are almost identical, sharing all layers except those with the include {Phase:train} or tags, include {phase:test}. Under this event, only the input layer and one output layer are different.
Input layer differences: the data input layer of the Training network extracts its information from the EXAMPLES/IMAGENET/ILSVRC12_TRAIN_LEVELDB input image and randomly mirrors it. The data layer of the test network obtains the information from the data, and EXAMPLES/IMAGENET/ILSVRC12_VAL_LEVELDB does not perform random mirroring.
Output layer Differences: Two networks output the Softmax_loss layer, which is used in training to calculate the loss function and initialize the reverse propagation, while in the validation this loss is reported briefly. The test network also has a second output layer, accuracy, which is used to report the accuracy on the test set. During the training process, the test network is occasionally instantiated and tested on the test set, generating similar test score #0: XXX and test score #1: XXX. In this case, the score 0 is accuracy (for untrained networks, which will start with 1/1000 = 0.001), and the score 1 is a loss (for untrained networks, it will start from 7).
We will also lay out a protocol buffer to run the solver. Let's make a few plans:
We will run in 256 batches and run a total of 450,000 iterations (about 90 cycles).
For every 1000 iterations, we use validation data to test the Learning network.
We set the initial learning rate to 0.01 and reduce it every 100,000 iterations (approximately 20 cycles).
Information is displayed for every 20 iterations.
The network will train with a weight decay of momentum 0.9 and 0.0005.
For every 10,000 iterations, we will save a snapshot of the current state.
Sounds good? This is the models/bvlc_reference_caffenet/solver.prototxt implemented in the.
Training Imagenet
You ready? Start training.
./build/tools/caffe Train--solver=models/bvlc_reference_caffenet/solver.prototxt
Sit down and enjoy it!
On K40 machines, each 20 iterations runs for approximately 26.5 seconds (and this takes 36 seconds on the K20), so for the full forward propagation-reverse propagation process, each image is effectively approximately 5.2ms. About 2 milliseconds is forward, and the rest is reversed. If you are interested in dissecting compute time, you can run
./build/tools/caffe Time--model=models/bvlc_reference_caffenet/train_val.prototxt
Continue training?
We all have experience with power outages, or we reward ourselves a little by playing the battlefield (someone still remembers the shock?). )。 We were able to recover from the snapshot due to the intermediate results of our snapshot during training. This can be done easily:
./build/tools/caffe Train--solver=models/bvlc_reference_caffenet/solver.prototxt--snapshot=models/bvlc_ Reference_caffenet/caffenet_train_iter_10000.solverstate
In the script, Caffenet_train_iter_10000.solverstate is the Solver state snapshot, which stores all the necessary information for recovering the exact Solver state (including parameters, momentum history, and so on).
Word segmentation
Hope you like this method! Since the ILSVRC 2012 challenge, many researchers have gone further, changing the network architecture or fine-tuning the various parameters in the network to meet new data and tasks. Caffe makes it easier to explore different network options by simply writing different prototxt files-isn't that exciting?
Now you have a trained network to see how to use it and Python interface classification imagenet.
Caffe Imagenet Official Document Chinese version