Learning Note TF033: Implementing ResNet

Source: Internet
Author: User

ResNet (Residual neural Network), Microsoft Research Kaiming He and other 4 Chinese people proposed. Through Residual Unit training 152 layer Deep neural network, ILSVRC 2015 tournament champion, 3.57% top-5 error rate, the number of parameters is lower than vggnet, the effect is very prominent. ResNet structure, very fast acceleration of ultra-deep neural network training, model accuracy is greatly improved. Inception v4,inception Module, resnet combination. ResNet good promotion.

Rui 10 Professor Schmidhuber (inventor of the Lstm Network, 1997) proposed highway networks. Solve the problem of extremely deep neural network difficult to train. Modify each layer activation function, the previous activation function is only for the input nonlinear Transformation y=h (X,WH), Highway network reserves a certain proportion of the original input x,y=h (X,WH) · T (X,WT) +x C (X,WC), T-transform coefficient, C retention factor, c=1-t. The previous layer of information, a certain proportion of the matrix multiplication and non-linear transformation, directly transfer the next layer. Highway network,gating Units Learning control Network information flow, learning the original message should be retained proportions. Gating mechanism, Professor Schmidhuber early lstm cyclic neural network gating. Hundreds of thousands of layers deep highway network, direct gradient descent algorithm training, with a variety of nonlinear activation functions, learning extremely deep neural networks. The Highway network allows the training of arbitrary deep networks, and the optimization method is independent of the network depth.

ResNet allows the raw input information to be transferred directly to the back layer. Degradation problem, deepening the depth of the neural network, accurate first rise reached saturation, and then decline. ResNet inspiration, with congruent mapping directly to the front layer output to the back layer. Neural network input x, expected output H (x), input x directly to the output as the initial result, learning target f (x) =h (x)-X. ResNet Residual Learning Unit (residual unit), no longer learns full output h (x), only learns output input differential h (x)-X, residuals.

ResNet, many bypass spur lines, input directly to the back layer, the back layer directly learning residuals, shortcut or connections. Direct input information to the output to protect the integrity of information, the entire network only learning input, output differences, simplifying learning goals, difficulty.

The two-tier new learning unit consists of two identical output channel numbers 3x3 convolution. The three-layer residual network is used with the networks in network and the inception Net 1x1 convolution. In the middle of the 3x3 convolution before and after the use of 1x1 convolution, first reduced dimension and then ascending dimension. If the input and output dimensions are different, the dimension is transformed to the input x-linear mapping, followed by the back layer.

LayerName outputsize 18-layer 34-layer 50-layer 101-layer 152-layer
conv1 112x112 7x7,64,stride 2
conv2_x 56x5 6 3x3 Max Pool,stride 2
3x3,64x2 3x3,64x3 1x1,64x3 1x1,64x3 1x1,64x3
3x3,64 3x3,64 3x3,64 3x3,64 3x3,64
1x1,25 6 1x1,256 1x1,256
conv3_x 28x28 3x3,128x2 3x3,128x4 1x1,128x4 1x1,128x4 1x1,128x8
3x3,128 3x3,128 3x3,128 3x3,128 3x3,128
1x1,512 1x1,512 1x1,512
conv4_x 14x14 3x3,256x2 3x3,256x6 1x1,256x6 1x1,256x23 1x1,256x36
3x3,256 3x3, 3x3,256 3x3,256 3x3,256
1x1,1024 1x1,1024 1x1,1024
conv5_x 7x7 3x3,512x2 3x3,512x3 1x1,512x3 1x1,512x3 1x1,512 X3
3x3,512 3x3,512 3x3,512 3x3,512 3x3,512
1x1,2048 1x1,2048 1x1,2048
1x1 average pool,1000-d fc,softmax
F LOPs 1.8x10^9 3.6x10^9 3.8x10^9 7.6x10^9 11.3x10^9

ResNet structure, eliminating the increase of the number of layers and increasing the error of training set. The ResNet network training error decreases with the increase of the number of layers, and the test set performance becomes better. Google for reference ResNet, proposed inception V4 and INCEPTION-RESNET-V2,ILSVRC error rate 3.08%. "Identyty Mappings in deep residual Networks" proposed ResNet V2. ResNet Residual Learning Unit propagation formula, feedforward information and feedback signal can be transmitted directly. Skip connection The non-linear activation function, replacing the identity Mappings (y=x). ResNet each layer is normalization with batch.

Professor Schmidhuber, ResNet, without Gates lstm network, the input x pass to the back layer process has been occurring. ResNet equivalent rnn,resnet is similar to the multi-layer inter-network integration Method (ensemble).

The Power of Depth for Feedforward neural Networks theory proves that deepening the network is more effective than widening the network.

TensorFlow implements ResNet. Contrib.slim Library, native collections. Collections.namedtuple Design resnet Basic block module group named tuple, create block class, only data structure, no concrete method. Typical block, three parameters, scope, UNIT_FN, args.
Block (' Block1 ', Bottleneck, [(1, 2)] * 2 + [([[[]], Block1 is the block name (or scope), and bottleneck is the ResNet V2 residual learning unit. The last parameter is the block Args,args is the list, each element corresponds to the bottleneck residual learning unit. The first two elements (256, 64, 1), the third element (256, 64, 2), each element is a ternary tuple (depth,depth_bottleneck,stride). (256, 64, 3) represents the bottleneck residual learning unit (three convolutional layers), the third output channel number depth 256, the first two output channels Depth_bottleneck 64, the middle step stride 3. Residual learning unit structure [(1X1/S1,64), (3x3/s3,64), (1x1/s1,256)].

Subsample method, parameter inputs (input), factor (sampling factor), scope. Fator1, do not modify the direct return inputsx, not 1, with slim.max_pool2d maximum pooling implementation. 1x1 pool Size, stride step, to achieve the reduction of sampling.

Define the Conv2d_same function to create the convolution layer, if stride is 1, same with slim.conv2d,padding mode. Stride not 1, explicit pad zero. Pad Zero Total kernel_size-1 Pad_beg for the remainder of Pad//2,pad_end. Tf.pad 0 Input variable. Already zero padding, just create this convolutional layer slim.conv2d padding mode valid.

Define the stack blocks function, the parameter net input, and the blocks is the Block class list. Outputs_collections collect each end_points collections. Two layers of loops, one block at a residual unit stack. Use two tf.variable_scope to name the residual learning Unit block/unit_1 form. 2nd cycle, each block each residual Unit args, expand Depth, Depth_bottleneck, stride. UNIT_FN residuals Learn Tan into functions, sequentially creating connections to all residual learning units. slim.utils.collect_named_outputs function, output NET is added to collection. All blocks all residual unit are stacked and return the final net as the result of the Stack_blocks_dense function.

Create ResNet Universal Arg_scope, which defines the function parameter defaults. Define training Tag Is_training default true, weight decay speed weight_decay default 0.001. BN decay rate Default 0.997,bn epsilon default 1e-5,bn scale default true. First set the BN parameters, through the Slim.arg_scope set slim.conv2d default parameters, the weight of the regular device set L2 regular, the weight initializer set Slim.variance_scaling_initializer (), the activation function set Relu, The standard set of bn. The maximum pooled padding mode is set by default same (valid in the paper), which is easier to align features. Multiple layers of nested arg_scope are returned as results.

Defines the core bottleneck residual learning unit. ResNet V2 thesis full preactivation residual Unit Variant. Batch normalization is used in front of each layer, input preactivation, not convolution for activation function processing. parameters, inputs input, depth, Depth_bottleneck, Stride,outputs_collections collection end_points collection,scope is the unit name. The input last dimension output channel is obtained with the Slim.utils.last_dimension function, and the parameter min_rank=4 is limited to a minimum of 4 dimensions. Slim.batch_norm Enter batch normalization and pre-activate preactivate with the Relu function.

Define shorcut, direct connect x, if the residual unit input channel number depth_in, output channel number depth consistent, with subsample, step stride,inputs space drop sampling, to ensure the space size and residuals consistent, residual middle convolution step stride If not, change the channel number with the step Stride 1x1 convolution to be consistent.

Define residual (residuals), 3 layers, 1x1 size, step 1? Number of output channels Depth_bottleneck convolution, 3x3 size, step stride? Number of output channels Depth_bottleneck convolution, 1x1 size, step 1? The number of output channels depth convolution, the final residual, the last layer has no regular item no activation function. Residual, shorcut Add, the final result output, with slim.utils.collect_named_outputs, the result is added collection, return the output function results.

Defines the generation ResNet V2 main function. parameters, inputs input, blocks is the Block class list, num_classes the last output class number, Global_pool flag whether to add the last layer of global average pooling, include_root_ Whether the block flag is added ResNet the front of the network 7x7 convolution, maximum pooling, reuse flag is reused, scope the entire network name. Define Variable_scope, end_points_collection, parameters of slim.con2d, bottleneck, stack_block_dense functions by Slim.arg_scope Outputs_ Collections default End_points_colletion. Based on the include_root_block tag, create a 7x7 convolution of resnet 64 output channel Step 2, and a 3x3 maximum pooling of step 2. Two steps 2 layers, picture size reduced to 1/4. Using Stack_blocks_dense to generate Residual learning Module group, the global average pooling layer is added according to the tag, and the efficiency is higher than that of direct avg_pool by using Tf.reduce_mean to realize global average pooling. Add the output channel num_classes1x1 convolution (no activation function without regular items), add Softmax layer Output network results, depending on whether there is a classification number. Convert collection to Python dict with slim.utils.convert_to_dict. Finally, return net, end_points.

50-layer resnet,4 residual learning blocks,units quantity is 3?4?6?3, total layer (3+4+6+3) x3+2=50. Residual learning module, convolution, pooling size reduction of 4 times times, the first 3 blocks contains step 2, the total size reduced 4x8=32 times. The input image size is changed to 224/32=7. The ResNet continuously uses the Step 2 layer to reduce the size, the output channel number continues to increase, achieves 2048.

152 Layer ResNet, second block units number 8, third block units number 36.

200 layer ResNet, second block units number 23, third block units number 36.

The evaluation function Time_tensorflow_run tests the performance of Layer 152 resnet forward. Picture size 224x224,batch Size 32. Is_training flag set false. resnet_v2_152 Create a network, Time_tensorflow_run evaluate forward performance. Time-consuming increase of 50%, practical convolutional neural network structure, support ultra-deep network training, real industrial application forward performance is not poor.

Resources:
"TensorFlow Practice"

Welcome to pay consultation (150 yuan per hour), my: Qingxingfengzi

Learning Note TF033: Implementing ResNet

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.