TensorFlow realize Classic Depth Learning Network (4): TensorFlow realize ResNet
ResNet (Residual neural network)-He Keming residual, a team of Microsoft Paper Networks, has successfully trained 152-layer neural networks using residual unit to shine on ILSVRC 2015 , get the first place achievement, obtain 3.57% top-5 error rate, the effect is very outstanding. The structure of ResNet can accelerate the training of ultra deep network very quickly, and the accuracy rate of the model is greatly improved. and deep residual Learning for Image recognition (paper address paper) also obtained CVPR2016 's best paper, is really deserved. This article will introduce the fundamentals of ResNet and how tensorflow can implement it.
ResNet's initial inspiration came from this question: depth Learning network depth on the final classification and recognition of the effect has a great impact, so the normal idea is to be able to design the network deeper and better, but in fact is not the case, the regular network stack (plain network) when the network is very deep, The effect is getting worse, that is, the accuracy will rise first and then reach saturation, in the continuous increase in depth will lead to a decline in accuracy.
ResNet Residual network:
• Core Components Skip/shortcut connection
Plain Net: You can fit any target mapping H (x)
Residual net
• Can be fitted with any target mapping F (x), H (x) =f (x) +x
F (x) is a residual mapping, relative to identity
• When the H (x) Optimal mapping approaches identity, it is easy to capture small perturbations
This is not a problem of fitting, because the error in the training set is increased not only in the test set. To solve this problem, the author proposes a residual structure:
The idea of using congruent mappings to upload the first layer of output directly to the rear is to add an identity mapping (identity map), which is the source of inspiration for ResNet. Assuming that the input of a neural network is x and the desired output is H (x), if we direct the input x to the output as the initial result, then the desired function h (x) is converted to f (x) +x. The above figure is the ResNet learning Unit, which is equivalent to change the learning goal, this idea also originates from the residual vector coding in the image processing, and through a reformulation, the problem is decomposed into the direct residual problem of many scales, which can play a very good effect in optimizing training.
• Other Designs
• All the 3x3 convolutional cores
• Convolution Step 2 instead of pooling
• Use Batch Normalization
• Cancel
Max of Pool
• Full Connection Layer
dropout
The above image is VGGNet-19, and the 34-layer common convolution network, and the 34-layer deep ResNet network contrast map. The biggest difference we can see is that ResNet has many bypasses that connect the input directly to the back layer, allowing the rear layer to learn the residuals directly, and this structure becomes shortcut or skip connections. Although the shortcut structure is inserted on the plain, the parameters of the two networks are the same, and the ResNet effect is very good, and the rate of convergence is much faster than that of plain.
• Deeper networks: optimizing residual mapping networks based on Bootleneck
• Original: 3x3x256x256 to 3x3x256x256
• Optimization: 1x1x256x64 to 3x3x64x64 to 1x1x64x256
In addition to the two-level residual learning unit, the residual learning Unit is equivalent to reducing the number of parameters for the same number of layers, so it can be extended into a deeper model.
Two-and three-layer ResNet residuals module
ResNet has 50, 101, 152 layers, such as neural networks, in which the infrastructure is very similar, are mentioned above two-and three-layer residual cell stack. These not only did not appear the degenerate problem, the error rate also greatly reduces, moreover eliminates the layer number unceasingly to deepen causes the training set error to increase the phenomenon, simultaneously the computational complexity also maintains at the very low degree.
ResNet network configuration with different layers
is time-consuming to use imagenet datasets, so this article will test the full ResNet V2 network for Speed. Evaluating forward time consuming and backward time consuming. If the reader is interested, you can download the Imagenet dataset for training and testing.
Once the preparations are ready, we can build the network. ResNet V2 is relatively complex, in order to reduce the amount of code to build a layer of ResNet v2, This article will take the auxiliary library to implement. The following code is based on my understanding of the ResNet network and existing resources ("TensorFlow combat" and so on) sorted out, and according to their own understanding added comments. Code comments Please correct me if there are any errors.
#-*-coding:utf-8-*-import os os.environ[' tf_cpp_min_log_level ' = ' 2 ' # resnet V2 # Loading module, TensorFlow import Collectio NS Import TensorFlow as tf slim = Tf.contrib.slim # defines block class blocks (collections.namedtuple ' block ', [' scope ', ' unit_ fn ', ' args ']): ' A named tuple describing a resnet block ' # defines drop-sampling subsample method def subsample (inputs, factor, scope=none ): if factor = = 1:return inputs Else:return slim.max_pool2d (inputs, [1, 1], Stride=factor, SCOP E=scope) # define Conv2d_same function to create convolution layer def conv2d_same (inputs, num_outputs, kernel_size, Stride, Scope=none): If stride = 1:return slim.conv2d (inputs, num_outputs, Kernel_size, stride=1, padding= ' SAME ', scope =scope) Else: # kernel_size_effective = kernel_size + (kernel_size-1) * (rate-1) pad_total = Kerne
L_size-1 Pad_beg = pad_total//2 Pad_end = Pad_total-pad_beg inputs = Tf.pad (inputs, [[0, 0], [Pad_beg, Pad_end], [Pad_beg, Pad_end], [0, 0]]] return slim.conv2d (inputs, num_outputs, kernel_size, Stride=stri DE, padding= ' VALID ', Scope=scope) @slim. Add_arg_scope # define stack blocks functions, two-layer cyclic def stack_blocks_de NSE (NET, blocks, Outputs_collections=none): For blocks in Blocks:with Tf.variable_scop E (block.scope, ' blocks ', [NET]) as Sc:for I, Unit in enumerate (Block.args): With tf.variable_s
Cope (' unit_%d '% (i + 1), values=[net]): Unit_depth, Unit_depth_bottleneck, unit_stride = unit
NET = BLOCK.UNIT_FN (NET, depth=unit_depth,
Depth_bottleneck=unit_depth_bottleneck, Stride=unit_stride) NET = slim.utils.collect_named_outputs (outputs_collections, sc.name, net) return NET # Create ResNet Universal Arg_scope, define function defaults Recognition parameter Value def RESNET_ARG_SCOPE (Is_training=true, weight_decay=0.0001, batch_norm_decay=0.997, Batch_norm_epsilon=1e-5, batch_norm_scale=true): Batch_norm_params = {' Is_trainin G ': is_training, ' decay ': Batch_norm_decay, ' epsilon ': Batch_norm_epsilon, ' scale ': Batch_norm_sca Le, ' updates_collections ': TF. Graphkeys.update_ops,} with Slim.arg_scope ([slim.conv2d], Weights_regularizer=slim.l2_r Egularizer (Weight_decay), Weights_initializer=slim.variance_scaling_initializer (), activation_fn=t F.nn.relu, Normalizer_fn=slim.batch_norm, Normalizer_params=batch_norm_params): With Slim. Arg_scope ([Slim.batch_norm], **batch_norm_params): With Slim.arg_scope ([slim.max_pool2d], padding= ' SAME ') as Arg_sc:return arg_sc @slim. Add_arg_scope # Defines the core bottleneck residuals learning Unit def bottleneck (inputs, DEPTH, Depth_bottleneck, Stride, Outputs_collections=none, Scope=none): With Tf.variable_scope (scope, ' B Ottleneck_v2 ', [inputs]) as Sc:depth_in = Slim.utils.last_dimension (Inputs.get_shape (), min_rank=4) PREAC t = slim.batch_norm (inputs, Activation_fn=tf.nn.relu, scope= ' preact ') if depth = = Depth_in:shortcut = Subsample (inputs, stride, ' shortcut ') Else:shortcut = slim.conv2d (preact, depth, [1, 1], Stride=stri De, Normalizer_fn=none, Activation_fn=none, scope= ' s Hortcut ') residual = slim.conv2d (Preact, Depth_bottleneck, [1, 1], stride=1, scope = ' conv1 ') residual = conv2d_same (residual, Depth_bottleneck, 3, stride, scope= ' conv 2 ') residual = slim.conv2d (residual, depth, [1, 1], stride=1, Normalizer_fn=none, a
Ctivation_fn=none, Scope= ' conv3 ') output = shortcut + residual return Slim.utils.collect_named_ou
Tputs (Outputs_collections, Sc.name,
Output) # defines the main function def RESNET_V2 for generating resnet V2 (inputs, blocks, Num_classes=none,
Global_pool=true, Include_root_block=true, Reuse=none, Scope=none): With Tf.variable_scope (scope, ' resnet_v2 ', [inputs], reuse=reuse) as Sc:end_points_collection = Sc.original_name_ Scope + ' _end_points ' with Slim.arg_scope ([slim.conv2d, Bottleneck, Stack_blocks_dens E], outputs_collections=end_points_collection): NET = inputs if includ E_root_block:with Slim.arg_scope ([slim.conv2d], Activation_fn=none,
Normalizer_fn=none): NET = Conv2d_same (NET, 7, stride=2, scope= ' conv1 ') net = slim.max_pool2d (NET, [3, 3], stride=2, scope= ' pool1 ') net = stack_blocks_dense (NET, blocks) NET = Slim.batch_norm (NET, Activa
Tion_fn=tf.nn.relu, scope= ' postnorm ') if Global_pool: # Global average pooling.
NET = Tf.reduce_mean (NET, [1, 2], name= ' pool5 ', keep_dims=true) if num_classes not None: NET = slim.conv2d (NET, num_classes, [1, 1], Activation_fn=none, Normalizer_fn=none, SCO
Pe= ' Logits ') # Convert end_points_collection into a dictionary of end_points.
end_points = Slim.utils.convert_collection_to_dict (end_points_collection) If num_classes is not None: end_points[' predictions ' = Slim.softmax (NET, scope= ' predictions ') return NET, End_points # design layer 50 R
ESnet V2 def resnet_v2_50 (inputs, Num_classes=none, Global_pool=true, Reuse=none, scope= ' resnet_v2_
"): Blocks = [Block (' Block1 ', bottleneck, [(256, 1)] * 2 + [(256, 2)]], block ( ' Block2 ', Bottleneck, [(1)] * 3 + [(128, 128, 2)]), block (' Block3 ', bottleneck, [1024, 256 , 1)] * 5 + [(1024, 256, 2)]), block (' Block4 ', bottleneck, [(2048,, 1)] * 3)] return resnet_v 2 (inputs, blocks, num_classes, Global_pool, Include_root_block=true, Reuse=reuse, Scope=scope) # design 1
01-Storey ResNet V2 def resnet_v2_101 (inputs, Num_classes=none, Global_pool=true, Reuse=none, scope= ' resnet_v2_101 '): Blocks = [Block (' Block1 ', bottlene CK, [(256, 1)] * 2 + [(256, 2)]), block (' Block2 ', bottleneck, [(512, 128, 1)] * 3 + [(512, 12
8, 2)]), Block ( ' Block3 ', bottleneck, [(1024, 256, 1)] * + [(1024, 256, 2)]), block (' Block4 ', bottlene CK, [(2048, 1)] * 3)] return RESNET_V2 (inputs, blocks, num_classes, Global_pool, Include_roo
T_block=true, Reuse=reuse, Scope=scope) # Design of the 152-storey ResNet V2 def resnet_v2_152 (inputs, Num_classes=none,
Global_pool=true, Reuse=none, scope= ' resnet_v2_152 '): blocks = [ Block (' Block1 ', bottleneck, [(256, $1)] * 2 + [(256, 2)]), Block (' Block2 ', bottleneck, [(1)] * 7 + [(128, 128, 2)]), block (' Block3 ', bottleneck, [(1024, 256, 1)] * + [(1024, 256, 2)], block (' Block4 ', bottleneck, [(2048, 1)] * 3)] return Resnet_v2 (Inpu TS, blocks, num_classes, Global_pool, Include_root_block=true, Reuse=reuse, Scope=scope) # Design 200-storey re SNet V2 def resnet_v2_200 (inputs, Num_classes=none, Global_pool=true, Reuse=none, Scope= ' resnet_v2_200 '): Blocks = [Block (' Block1 ', bottleneck, [(256, 64, 1)] * 2 + [(256
), block (' Block2 ', Bottleneck, [(2, 128, 1)] * + [(, 128, 2)]), Block ( ' Block3 ', bottleneck, [(1024, 256, 1)] * + [(1024, 256, 2)]), block (' Block4 ', bottleneck, [ (2048, 512, 1)] * 3)] return RESNET_V2 (inputs, blocks, num_classes, Global_pool, Include_root_block=true, reuse=r Euse, Scope=scope) from datetime import datetime import Math import Time # evaluation function Def time_tensorflow_run (session, Target , info_string): num_steps_burn_in = total_duration = 0.0 total_duration_squared = 0.0 for i in range (nu M_batches + num_steps_burn_in): Start_time = Time.time () _ = Session.run (target) duration = time.t IME ()-start_time If I >= num_steps_burn_in:if not i% 10:print ('%s:step%d, duration =
%.3f '% (DateTime.Now (), i-num_steps_burn_in, duration)) Total_duration + = Duration total_duration_squared + = Duration * Duration MN = total_duration/num_batches VR = total_duration_squ
ARED/NUM_BATCHES-MN * MN SD = MATH.SQRT (VR) print ('%s:%s across%d steps,%.3f/+ +%.3f '%
(DateTime.Now (), info_string, Num_batches, MN, SD)) Batch_size = height, width = 224, 224 inputs = Tf.random_uniform ((batch_size, height, width, 3)) with Slim.arg_scope (re Snet_arg_scope (is_training=false)): NET, end_points = resnet_v2_152 (inputs, 1000) # 152-tier Evaluation init = Tf.global_variables _initializer () Sess = tf. Session () Sess.run (init) num_batches = Time_tensorflow_run (sess, net, "Forward")
Run the program, we will see the following program display (Forward performance test)
2017-10-15 10:59:00.831156:step 0, duration = 8.954
2017-10-15 11:00:30.933252:step, duration = 9.048
2017-10 -15 11:02:01.370461:step, duration = 8.999
2017-10-15 11:03:31.873238:step, duration = 8.953
2017-10-15 11 : 05:03.045593:step, Duration = 9.360
2017-10-15 11:06:33.642941:step, duration = 9.037
2017-10-15 11:08:03 .993324:step, Duration = 8.998
2017-10-15 11:09:34.304207:step, duration = 9.170
2017-10-15 11:11:05.94341 4:step, duration = 9.068
2017-10-15 11:12:38.635693:step, duration = 9.285 2017-10-15 11:14:03.069851:for
Ward across steps, 9.112 +/-0.153 Sec/batch
The above is the forward operation time of the ResNet V2 displayed during the program running, backward the reader can add it by itself.
At this point, ResNet's basic principles and tensorflow implementation of ResNet work is completed, the code has different layers of resnet depth design, the reader can modify the code to explore the different depths of the network performance. ResNet has very exquisite design and construction, has the milestone significance, truly realizes the extremely deep network training, also provides many can draw on the CNN design thought and the trick, and has obtained the very good effect.
In the follow-up work, I will continue to show tensorflow and deep Learning Network brings endless fun, I will discuss with you the mysteries of deep learning. Of course, if you are interested, my Weibo will share with you the cutting-edge technologies in artificial intelligence, machine learning, depth learning and computer vision.