wide residual networks--wide residual network
Article from Sergey Zagoruyko's wide residual network residual networks--residual
In recent years, the residual network (residual networks) has achieved good results on each test set, and its network structure is shown in the following diagram.
Its network network is composed of residual module
The jump-Link can pass the residuals well in the past, thus avoiding the phenomenon of gradients disappearing.
However, the depth of the residual network too much to pursue the depth of the network, but ignored the problem of the module itself. With the increase of the module, the performance of the model is not significantly improved, so it can be understood as part of the module actually does not play its due role,
As gradient flows through the network there is nothing to force it to go through residual block weights and ti can avoid l Earning anything during training, so ti are possible that there are either only a few blocks that learn useful representatio Ns
Therefore, the author of this paper wants to propose a more effective way to improve the effect of the residual module.
Out goal are to explore a much richer set of the network architectures of ResNet Blocks and thoroughly examine how several othe R different aspects Besides the order of activations affect performance. wide residual network--wrn
Wrn adds a coefficient k on the basis of the original residual module, thus widening the number of convolution cores. As explained in the article, this reduces the number of layers, but does not reduce the model parameters and speeds up the calculation.
In particular, we present wider deep residual networks this significantly improved, having the times less layers and being More than 2 times faster.
The model structure is shown in the following table
Experiment
The experimental results given in the paper
Code
#-*-Coding:utf-8-*-"" "Created on Tue Nov 20:43:10 @author: Sky_gao" "" from Keras.models import Model From keras.layers import Input, Add, Activation, dropout, Flatten, dense from keras.layers.convolutional import convolutio n2d, Maxpooling2d, averagepooling2d from keras.layers.normalization import batchnormalization from Keras import backend a
S-K def initial_conv (input): x = convolution2d (3, 3), padding= ' same ', kernel_initializer= ' he_normal ', Use_bias=false) (input) Channel_axis = 1 if k.image_data_format () = = "Channels_first" else-1 x = Ba Tchnormalization (Axis=channel_axis, momentum=0.1, epsilon=1e-5, gamma_initializer= ' uniform ') (x) x = Activation (' Relu ') (x) return x def expand_conv (init, base, K, strides= (1, 1)): x = convolution2d (base * k, (3, 3), padding= ' Same ', strides=strides, kernel_initializer= ' He_normal ', Use_bias=false) (init) Channel_axis = 1 If K.image_data_format () = = "chAnnels_first "Else-1 x = Batchnormalization (Axis=channel_axis, momentum=0.1, epsilon=1e-5, gamma_initializer= ' Unifor M ') (x) x = Activation (' Relu ') (x) x = convolution2d (base * k, (3, 3), padding= ' same ', kernel_initializer= ' He_norma L ', Use_bias=false) (x) skip = convolution2d (Base * k, (1, 1), padding= ' same ', strides=strides, Kernel_initializer= ' He_normal ', Use_bias=false) (init) m = Add () ([x, Skip]) return M def Conv1_block (input, K=1, dropout=0.0): init = input Channel_axis = 1 if k.image_data_format () = = "Channels_first"
Else-1 x = batchnormalization (Axis=channel_axis, momentum=0.1, epsilon=1e-5, gamma_initializer= ' uniform ') (input)
x = Activation (' Relu ') (x) x = convolution2d (+ * k, (3, 3), padding= ' same ', kernel_initializer= ' he_normal ', Use_bias=false) (x) if dropout > 0.0:x = Dropout (dropout) (x) x = batchnormalization (axis=channel _axis, momentum=0.1, EPSIlon=1e-5, gamma_initializer= ' uniform ') (x) x = Activation (' Relu ') (x) x = convolution2d (* k, (3, 3), padding= ' SA Me ', kernel_initializer= ' he_normal ', Use_bias=false) (x) m = Add () ([init, X]) return M def
Conv2_block (input, K=1, dropout=0.0): init = input Channel_axis = 1 if k.image_dim_ordering () = = "th" else-1 x = Batchnormalization (Axis=channel_axis, momentum=0.1, epsilon=1e-5, gamma_initializer= ' uniform ') (input) x = Activa
tion (' Relu ') (x) x = convolution2d (* k, (3, 3), padding= ' same ', kernel_initializer= ' he_normal ', Use_bias=false) (x) if dropout > 0.0:x = Dropout (dropout) (x) x = Batchnormalization (Axis=channel_axis, Mome ntum=0.1, epsilon=1e-5, gamma_initializer= ' uniform ') (x) x = Activation (' Relu ') (x) x = convolution2d (+ * k, (3, 3) , padding= ' same ', kernel_initializer= ' he_normal ', Use_bias=false) (x) m = Add () ([init, x]) R Eturn m def conv3_block(Input, K=1, dropout=0.0): init = input Channel_axis = 1 if k.image_dim_ordering () = = "th" else-1 x = Batch Normalization (Axis=channel_axis, momentum=0.1, epsilon=1e-5, gamma_initializer= ' uniform ') (input) x = Activation (' Relu ') (x) x = convolution2d (+ * k, (3, 3), padding= ' same ', kernel_initializer= ' he_normal ', use_ Bias=false) (x) if dropout > 0.0:x = Dropout (dropout) (x) x = Batchnormalization (Axis=channel_axis, momentum=0 .1, epsilon=1e-5, gamma_initializer= ' uniform ') (x) x = Activation (' Relu ') (x) x = convolution2d (+ * k, (3, 3), Padd Ing= ' same ', kernel_initializer= ' he_normal ', Use_bias=false) (x) m = Add () ([init, X]) return M def create_wide_residual_network (Input_dim, nb_classes=100, n=2, K=1, dropout=0.0, verbose=1): "" "Creates a Wi De residual Network with specified parameters:p Aram input:input Keras object:p Aram Nb_classes:number of output Classes:p Aram N:depthof the network.
Compute N = (n-4)/6. Example:for a depth of all, n = 28-4, n = (16-4)/6 = 2 example2:for A depth of, n =, n = ()
/6 = 4 example3:for A depth of, n =, n = (40-4)/6 = 6:p Aram K:width of the network. :p Aram Dropout:adds Dropout if value is greater than 0.0:p Aram Verbose:debug info to describe created Wrn:retu
RN: "" "Channel_axis = 1 if k.image_data_format () = =" Channels_first "else-1 IP = Input (Shape=input_dim) x = Initial_conv (IP) nb_conv = 4 x = expand_conv (x, th, k) for I in Range (N-1): x = Conv1_block ( X, K, dropout) Nb_conv + = 2 x = batchnormalization (Axis=channel_axis, momentum=0.1, epsilon=1e-5, Gamma_initi alizer= ' uniform ') (x) x = Activation (' Relu ') (x) x = expand_conv (x, Th, K, strides= (2, 2)) for I in range (N- 1): x = Conv2_block (x, K, dropout) Nb_conv + = 2 x = Batchnormalization (AXis=channel_axis, momentum=0.1, epsilon=1e-5, gamma_initializer= ' uniform ') (x) x = Activation (' Relu ') (x) x = Expan
D_conv (x, 2, K, strides= (2, 2)) for I in Range (N-1): x = Conv3_block (x, K, dropout) Nb_conv + = x = Batchnormalization (Axis=channel_axis, momentum=0.1, epsilon=1e-5, gamma_initializer= ' uniform ') (x) x = Activati On (' Relu ') (x) x = Averagepooling2d ((8, 8)) (x) x = Flatten () (x) x = dense (nb_classes, activation= ' Softmax ') (x
Model = Model (IP, x) if Verbose:print ("Wide residual network-%d-%d created."% (Nb_conv, k)) return model if __name__ = = "__main__": From Keras.utils import Plot_model from keras.layers import Input from Keras.mode LS import Model init = (+, 3) wrn_28_10 = Create_wide_residual_network (init, nb_classes=10, n=2, k=2, Dropo
ut=0.0) wrn_28_10.summary () Plot_model (wrn_28_10, "Wrn-16-2.png", Show_shapes=true, Show_layer_names=true)