Caffe is reproduced on Cifar10 ResNet
ResNet in the 2015 imagenet competition, the recognition rate reached a very high level, here I will use Caffe on Cifar10 to reproduce the paper 4.2 section of the CIFAR experiment. the basic module of ResNet Caffe Implementation the experimental results and explanations on CIFAR10 the basic module of ResNet
In this paper, we use Caffe to reproduce the experiment of resnet on the Cifar10 of Torch7. The basic module of ResNet can be the following Python code:
@requires_authorization from __future__ import print_function to Caffe import layers as L, params as P, To_proto from CA Ffe.proto Import caffe_pb2 Import Caffe # Helper function for building resnet block Structures # The function below does Computations:bottom--->conv--->batchnorm def conv_factory (bottom, KS, N_out, stride=1, pad=0): conv = L.convol Ution (Bottom, Kernel_size=ks, Stride=stride, num_output=n_out, pad=pad, param = [Dict (Lr_mult=1, Decay_mult=1), Dict (lr_mult=2, decay_mult=0)], bias_filler=dict (type= ' constant ', value=0), weight _filler=dict (type= ' Gaussian ', std=0.01)) Batch_norm = L.batchnorm (CONV, In_place=true, PA Ram=[dict (lr_mult=0, decay_mult=0), Dict (lr_mult=0, decay_mult=0), Dict (lr_mult=0, decay_mult=0)]) scale = L.scale (bat Ch_norm, Bias_term=true, in_place=true) return scale # Bottom--->conv--->batchnorm--->relu def conv_factory_ Relu (Bottom, KS, N_out,Stride=1, pad=0): conv = l.convolution (bottom, Kernel_size=ks, Stride=stride, Num_output=n_out, Pad=pad, Param=[dict (Lr_mult=1, Decay_mult=1), Dict (lr_mult=2, decay_mult=0)], bias_filler=dic T (type= ' constant ', value=0), weight_filler=dict (type= ' Gaussian ', std=0.01)) Batch_norm = L.batchnorm (CONV, in_place=t Rue, Param=[dict (Lr_mult=0, decay_mult=0), Dict (lr_mult=0, decay_mult=0), Dict (lr_mult=0, decay_mult=0)]) scale = L.sc Ale (Batch_norm, Bias_term=true, in_place=true) Relu = L.relu (scale, in_place=true) return Relu # residual build ing block! Implements option (A) from section 3.3. The input # is passed through two 3x3 convolution layers. Currently this blocks only supports # stride = = 1 or stride = 2.
When stride are 2, the block actually does pooling. # Instead of simply doing pooling which may cause representational Bottlneck as # described in Inception V3, here we E 2 parallel branches P && C And add them # together. Note Pooling branch could has less channels than convolution branch so we # need Todo zero-padding along channel
N. and to the best knowledge of # ours, we haven ' t found current Caffe implementation-this supports.
# so later I ' ll give implementation in C + + and CUDA. def residual_block (Bottom, Num_filters, stride=1): if stride = 1:CONV1 = Conv_factory_relu (bottom, 3, Num_fi Lters, 1, 1) conv2 = Conv_factory (CONV1, 3, Num_filters, 1, 1) add = l.eltwise (bottom, conv2, operation=p.
Eltwise.sum) return Add elif Stride = = 2:CONV1 = Conv_factory_relu (bottom, 3, Num_filters, 2, 1) Conv2 = Conv_factory (CONV1, 3, Num_filters, 1, 1) pool = l.pooling (bottom, Pool=p.pooling.ave, kernel_size=2, stride=2) pad = L.padchannel (pool, num_channels_to_pad=num_filters/2) add = l.eltwise (conv2, pad, Operat Ion=p.eltwise.sum) return add Else:raise ExceptIon (' Currently, stride must be either 1 or 2. ') # Generate ResNet cifar10 train && test Prototxt.
N_size control number of layers. # The total number of layers is 6 * n_size + 2.
Here I am don ' t know any of implementation # which can contain simultaneously TRAIN && TEST phase. # ==========================note here============================== #!!!
So, have to include TRAIN && TEST by your own the script to generate the Prototxt!!! def resnet_cifar (Train_lmdb, Test_lmdb, Mean_file, batch_size=100, n_size=3): data, label = L.data (Source=test_lmdb, b Ackend=p.data.lmdb, Batch_size=batch_size, ntop=2, Transform_param=dict (Mean_file=mean_file, crop_size=28), includ E=dict (Phase=getattr (CAFFE_PB2, ' TEST ')) residual = Conv_factory_relu (data, 3, 1, 1) #--------------> 16, 1st Group for I in Xrange (n_size): residual = Residual_block (residual,) #-------------- ; 2nd Group RESidual = Residual_block (residual, 2) for I in Xrange (n_size-1): residual = Residual_block (residual, 32) #--------------> 8, 8 3rd Group residual = Residual_block (residual, 2) for I in Xrange (n_siz E-1): residual = Residual_block (residual,) #-------------> End of residual global_pool = L.pooli Ng (residual, pool=p.pooling.ave, global_pooling=true) fc = L.innerproduct (Global_pool, Param=[dict, lr_mult=1 Ult=1), Dict (lr_mult=2, decay_mult=1)],num_output=10, bias_filler=dict (type= ' constant ', value=0), Weight_filler=dict (type= ' Gaussian ', std=0.01)) loss = L.softmaxwithloss (FC, label) ACC = l.accuracy (FC, Label, Inc. Lude=dict (Phase=getattr (CAFFE_PB2, ' TEST ')) return To_proto (Loss, ACC) def make_net (tgt_file): With open (Tgt_fil E, ' W ') as F:print (' Name: ' Resnet_cifar10 "', file=f) print (Resnet_cifar (' Dataset/cifar10_train_lmdb ', ' da Taset/cifar10_test_lmdb ',
' Dataset/mean.proto ', n_size=9, file=f) if __name__ = = ' __main__ ': tgt_file= ' D:/vsprojec Ts/caffe/models/ucas_resnet_cifar10/res56_cifar_train_test.prototxt ' Make_net (tgt_file)
Caffe Implementation
First of all, all convolution cores are in 3x3 size, and when Stride=1 is ResNet a block inside the padding then feature map's dimensions are unchanged. This is where the input can be added to a branch that passes through two volumes. When stride=2, the size of the feature map is halved through the branch of the two convolution, and the input can be halved by pooling to the same size as through the two-volume integral branch, but may result in fewer channel than the number of volume integral branches in the pooling branch. There are two ways of solving this problem, as described in the original article: 1, through the 1x1 convolution linear projection, no activation function, and then directly set the number of convolution kernel to the volume integral support output of the number of channels equal; 2: By adding 0 padding to the channel dimension, namely zero-padding. Here I take the latter way, the Caffe implementation code for the latter (that is, the l.padchannel corresponding C + + code in the Python code above):
#ifndef caffe_pad_channel_layer_hpp_ #define CAFFE_PAD_CHANNEL_LAYER_HPP_ #include "caffe/blob.hpp" #include "caffe/ Layer.hpp "#include" Caffe/proto/caffe.pb.h "namespace Caffe {/* * @brief zero-padding Channel to extend number
of channels * * note:back-propagate just drop the pad derivatives * * Template <typename dtype> Class Padchannellayer:public layer<dtype> {public:explicit padchannellayer (const LAYERPARAMETER& ; param): layer<dtype> (param) {} virtual void Layersetup (const VECTOR<BLOB<DTYPE>*>&A mp
Bottom, const vector<blob<dtype>*>& top); virtual void reshape (const vector<blob<dtype>*>& bottom, const VECTOR<BLOB<DTYPE>*>
;& top);
Virtual Inline const char* type () const {return "Padchannel";}
Virtual inline int exactnumbottomblobs () const {return 1;} Virtual inline int exactnUmtopblobs () const {return 1;} protected:virtual void forward_cpu (const vector<blob<dtype>*>& Bottom, const VECTOR&L T
blob<dtype>*>& top); virtual void backward_cpu (const vector<blob<dtype>*>& Top, const vector<bool>& Propa
Gate_down, const vector<blob<dtype>*>& bottom); virtual void Forward_gpu (const vector<blob<dtype>*>& Bottom, const vector<blob<dtype>
*>& top); virtual void Backward_gpu (const vector<blob<dtype>*>& Top, const vector<bool>& Propa
Gate_down, const vector<blob<dtype>*>& bottom);
int num_channels_to_pad_;
}; }//Namespace Caffe #endif//Caffe_pad_channel_layer_hpp_
Pad_channel_layer.cpp
#include "caffe/layers/pad_channel_layer.hpp" namespace Caffe {Template <typename dtype> void Padchannell Ayer<dtype>::layersetup (const vector<blob<dtype>*>& Bottom, const VECTOR<BLOB<DTYPE> ;*>& top) {check_ne (top[0], bottom[0]) << this->type () << "Layer does not" "a
Llow in-place computation. ";
Num_channels_to_pad_ = This->layer_param_.pad_channel_param (). Num_channels_to_pad ();
CHECK_GT (num_channels_to_pad_, 0) << "num channels to pad must greater than 0!"; } template <typename dtype> void Padchannellayer<dtype>::reshape (const VECTOR<BLOB<DTYPE>*&G t;& Bottom, const vector<blob<dtype>*>& top) {vector<int> Top_shape = bottom[0
]->shape ();
TOP_SHAPE[1] + = Num_channels_to_pad_;
Top[0]->reshape (Top_shape); } template <typename dtype> void PadchannellayeR<DTYPE>::FORWARD_CPU (const vector<blob<dtype>*>& Bottom, const vector<blob<dtype>*
>& top) {Const dtype* bottom_data = Bottom[0]->cpu_data ();
dtype* top_data = Top[0]->mutable_cpu_data ();
int num = Bottom[0]->num ();
int channels = Bottom[0]->channels ();
int Dim = Bottom[0]->height () * Bottom[0]->width ();
int Channel_by_dim = channels * Dim;
for (int n = 0; n < num; n++) {caffe_copy (Channel_by_dim, Bottom_data, Top_data);
Bottom_data + = Channel_by_dim;
Top_data + = Channel_by_dim;
Caffe_set (Num_channels_to_pad_ * Dim, Dtype (0), top_data);
Top_data + = Num_channels_to_pad_ * Dim; } template <typename dtype> void padchannellayer<dtype>::backward_cpu (const VECTOR<BLOB<D type>*>& Bottom, const vector<bool>& propagate_down, const vector<blob<dtype>*>& top) {Const dtype* Top_diff = Top[0]->cpu_diff ();
dtype* Bottom_diff = Bottom[0]->mutable_cpu_diff ();
int num = Bottom[0]->num ();
int channels = Bottom[0]->channels ();
int Dim = Bottom[0]->height () * Bottom[0]->width ();
int Channel_by_dim = channels * Dim;
for (int n = 0; n < num, n++) {//Just drop the padding derivatives part.
Caffe_copy (Channel_by_dim, Top_diff, Bottom_diff);
Top_diff + = (channels + num_channels_to_pad_) * Dim;
Bottom_diff + = Channel_by_dim;
} instantiate_class (Padchannellayer);
Register_layer_class (Padchannel);
}//Namespace Caffe
Pad_channel_layer.cu
#include "caffe/layers/pad_channel_layer.hpp" namespace Caffe {//Copy (one line each thread) from one array to Ano
ther, with arbitrary//strides in the last two dimensions. Template <typename dtype> __global__ void pad_forward_kernel (const int dst_count, const int src_channels, const
int dst_channels, const int Dim, const dtype* SRC, dtype* DST) {cuda_kernel_loop (index, Dst_count)
{int num = index/(Dim * dst_channels);
int dst_c = Index/dim% Dst_channels;
int Pixel_pos = index% Dim;
if (Dst_c < src_channels) Dst[index] = Src[num * Src_channels * Dim + Dst_c * Dim + Pixel_pos];
else Dst[index] = dtype (0); } template <typename dtype> void Padchannellayer<dtype>::forward_gpu (const VECTOR<BLOB<D type>*>& Bottom, const vector<blob<dtype>*>& top) {const dtype* bottom_data = Bottom[0]->gpu_data ();
dtype* top_data = Top[0]->mutable_gpu_data ();
int src_channels = Bottom[0]->channels ();
int Dim = Bottom[0]->height () * Bottom[0]->width ();
int dst_channels = src_channels + num_channels_to_pad_;
const int dst_count = Top[0]->count ();
Pad_forward_kernel<dtype> << <caffe_get_blocks (dst_count), caffe_cuda_num_threads >> > (
Dst_count, Src_channels, Dst_channels, Dim, Bottom_data, Top_data);
Cuda_post_kernel_check; } template <typename dtype> __global__ void pad_backward_kernel (const int bottom_count, const int BOTTOM_CH annels, const int top_channels, const int Dim, const dtype* Top, dtype* bottom) {cuda_kernel_loop (Ind
Ex, bottom_count) {int num = index/(Dim * bottom_channels);
int bottom_c = Index/dim% Bottom_channels; int Pixel_pos = index% Dim;
Bottom[index] = Top[num * Top_channels * Dim + Bottom_c * Dim + Pixel_pos]; } template <typename dtype> void Padchannellayer<dtype>::backward_gpu (const VECTOR<BLOB<D type>*>& Top, const vector<bool>& Propagate_down, const vector<blob<dtype>*>&
Bottom) {Const dtype* Top_diff = Top[0]->gpu_diff ();
dtype* Bottom_diff = Bottom[0]->mutable_gpu_diff ();
int bottom_count = Bottom[0]->count ();
int bottom_channels = Bottom[0]->channels ();
int Dim = Bottom[0]->height () * Bottom[0]->width ();
int top_channels = bottom_channels + num_channels_to_pad_;
Pad_backward_kernel<dtype> << <caffe_get_blocks (bottom_count), caffe_cuda_num_threads >> > (
Bottom_count, Bottom_channels, Top_channels, Dim, Top_diff, Bottom_diff);
Cuda_post_kernel_check; } Instantiate_layer_gpu_funcs (PadchannellAyer);
}//Namespace Caffe
Experimental Results
1, using the above script for Cifar10 to generate 20-layer resnet. The network definition is too long, here does not give out, but should note is because the original image is 32x32 size, here random crop 28x28, and mirror. It is different from the original paper that 4 pixels in each side of the picture are randomly cropped 32x32. The input part of the network definition as well as the Solver.prototxt:
Name: ' resnet_cifar10 '
layer {
name: ' Input '
type: ' Data ' top
: ' Data1 ' top
: ' label '
include {
Phase:train
}
Transform_param {
mean_file: "Dataset/mean.proto"
crop_size:28
mirror:true
}
Data_param {
Source: ' Dataset/cifar10_train_lmdb '
batch_size:100
backend:lmdb
}
}
layer {
name: ' Input '
type: ' Data ' top
: ' Data1 ' top
: ' label '
include {
phase:test
}
Transform_param {
mean_file: "Dataset/mean.proto"
crop_size:28
}
data_param {
Source: "Dataset/cifar10_test_lmdb"
batch_size:100
backend:lmdb
}
}
Solver.prototxt is as follows
net: "Res20_cifar_train_test.prototxt" test_iter:100 # Conver the whole test set. 100 * 100 = 10
Images. TEST_INTERVAL:500 # Each are one epoch, test after each epoch base_lr:0.1 momentum:0.9 weight_decay:0.0001 Average_ loss:100 lr_policy: "multistep" stepvalue:40000 stepvalue:80000 gamma:0.1 display:100