resnet_v1:deep residual learning for Image recognition
Conv--> bn--> Relu
The bottleneck structure is as follows: {Because of concerns on the training time that we can afford, so changed to bottleneck structure}
Two structures have the same complexity, but the left side of the dimension is 64 right is 256 is actually 4 times times the relationship
Resnet_v2:identity Mappings in deep residual Networks
(a) is the structure adopted in the first essay
This article focuses on creating a direct path to disseminate information, not just within a residual unit, but in the entire network.
The paper finds that when H (X) and F (Y) are self-mapping, the signal can be transmitted directly to the next unit, either forward propagation or back propagation, so the structure of (b) is designed
Experiments have found that H (x) uses 1x1 convolution or gate is not as good as direct self-mapping results
by post-activation into Pre-activation before was relu after Conv, now Relu before Conv
(1) Any layer of output can be represented by a previous output and a residual structure
(2)
The output of any layer can be represented by the sum of the original input and all residuals output to it, whereas the previous network is the product representation of the layer and layer
(3) Derivative discovery of loss
Sum that will not always be-1, so the gradient will not diffuse vanish, even when weights arbitrarily small
Compared with different shortcut ways, it is found that simple identity is better, 1x1conv effect is worse, can be used to deal with the data of different dimensions
Comparing the different combinations of relu,bn,conv, it is found that the following full pre-activation have the best structural effect.
Appendix:
For the input of the first residual structure, because the front is a separate conv layer, we need to activation the results of the conv layer
For the last residual structure, an additional activation is performed after addition
The above is the paper interpretation. Next is the code implementation.
Code implementation
def resnet_v2_50:
Block = ...
Return RESNET_V2 (Block)
Def resnet_v2 ():
NET = conv2d (64,6,stride=2, scope= "Conv1")
NET = max_pool2d (NET, [3,3] stride=2, scope= "Pool1")
NET = Stack_blocks_dense (blocks)
NET = Batch_norm (NET, Activation_fn=tf.nn.relu, scope= "Postnorm")
return net
def stack_blocks_dense (blocks):
For block in blocks:
If not the last layer:
Else
TensorFlow Slim Implementation RESNET_V2