Example under Imagenet folder train_caffenet.sh The configuration file is models/bvlc_reference_caffenet/ Solver.prototxt, found the solver.prototxt, inside the corresponding model for, Models/bvlc_reference_caffenet/train_ Val.prototxt, so this blog post, the main description is models/bvlc_reference_caffenet/train_val.prototxt. And you can see the alenet from the Model folder. This is actually called the model here, with the paper Imagenet classification with the deep convolutional neural networks model is a little different. However, the gap is not small, to help understand the model of the overall idea of the paper, or very helpful.
This article main reference: ImageNet classification with deep convolutional neural networks
[Caffe] alexnet interpretation of the image classification model of deep learning
Study notes: alexnet&imagenet study notes
The following is mainly to the model Bvlc_reference_caffenet/train_val.prototxt understanding, the beginning also explained that this is caffenet, this is the Caffe called Net.
Problem:
1, hey, this model, due to the standard graphics card. I haven't run yet. To run, not run, write this blog, always feel no clout. About batchsize this piece, are looking at [Caffe] deep learning of the image classification model alexnet interpretation, the last output of the data, so I can run the code later, then come back to see what the problem.
2, this data inside again appeared mean_file: "Data/ilsvrc12/imagenet_mean.binaryproto", this mean value why.
Training phase:
As a result of this model, the convolution and pooling are all merged in the previous layers, so if you look at the graph, the convolution and pooling layers are drawn separately, relative to the previous mnist. This picture is not so clear-minded.
Input data: Data is 256 3 227 227. Because of the set batch_size:256, and the input is an RGB image of 227x227. So the input is 256 3 227 227. In fact, this 227x227, and the alexnet data 224x224 different.
First convolution layer: Data becomes 256 96 55 55. As in the paper, it is also 96 cores (but according to Jiayanqing, this is a single GPU model, not a dual GPU). Each core size is still 11x11, with a step of 4. See a better way to calculate the size of the convolution: (227-11)/4+1=55. is equivalent to subtracting the convolution kernel size, and then see how many steps, because the front is calculated as the middle length, and then +1, is the final convolution of the characteristics of the dimensions. In fact, you can see why the size of the input data becomes 227, just for the case of a 55x55-sized feature map after convolution, it will not appear as if the paper needs to be filled. This is also mentioned in my paper notes to fill 3 pixels. This is equal to the 3 pixels to fill up, 224+3=227.
ReLU, in the convolution immediately after ReLU, this in the CIFAR10 model actually has, but cifar10 the second convolution after ReLU, this directly ReLU, do not know what aspect to consider, is the data volume is too big. Requires immediate thinning to extract local features
First convolution layer: data becomes
Caffe Imagenet Model Understanding