Learning notes TF033: Implement ResNet and tf033resnet

Last Update:2017-07-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

ResNet (Residual Neural Network), Microsoft Research Institute Kaiming He and other four Chinese proposed. Using Residual Unit to train a 152-layer Deep neural network, the ILSVRC won the 2015 competition, with a 3.57% top-5 error rate. The parameter quantity is lower than that of VGGNet and the effect is outstanding. The ResNet structure accelerates deep neural network training and greatly improves model accuracy. Inception V4, Inception Module, and ResNet. The promotion of ResNet is good.

Professor ruesten, inventor of LSTM Network, proposed the Highway Network in 1997. This solution is difficult to train deep neural networks. Modify the activation function for each layer. Previously, the activation function only changes the input non-linear y = H (x, WH), and the Highway NetWork retains a certain proportion of the original input x, y = H (x, WH) · T (x, WT) + x · C (x, WC), T conversion coefficient, C retention coefficient, and C = 1-T. The previous layer of information directly transmits the next layer without matrix multiplication or nonlinear transformation. Highway Network, gating units learning controls the Network information flow, and the ratio of original learning information should be kept. Gating mechanism, Professor mongodhuber's early LSTM Recurrent Neural Network gating. Hundreds of thousands of deep Highway networks, direct gradient descent algorithm training, with a variety of Nonlinear activation functions, learning deep neural networks. Highway Network allows training of any depth Network. The optimization method is independent of the depth of the Network.

ResNet allows direct transmission of original input information to the back layer. The Degradation problem continues to deepen the depth of the neural network, and accuracy first increases to saturation and then drops. Inspired by ResNet, the former layer is directly output to the latter layer using full ing. Input x in the neural network, expect the output H (x), input x directly to the output for initial results, and the learning target F (x) = H (x)-x. ResNet Residual Learning Unit (Residual Unit), instead of learning the complete output H (x), only learning the output input difference H (x)-x, Residual.

ResNet, a lot of bypass branches, input directly connected to the back layer, the back layer directly learns residual, shortcut or connections. The input information is directly transferred to the output to protect information integrity. The entire network only learns Input and Output differences, simplifying the learning objectives and difficulties.

The two-layer residual Learning Unit consists of two 3x3 convolution units with the same output channels. Layer-3 residual Network uses Network In Network and Inception Net 1x1 convolution. 1x1 convolution is used before and after 3x3 convolution in the middle. dimension reduction is performed before dimension escalation. If the input and output dimensions are different, the input x-ray ing is used to transform the dimensions, followed by the back layer.

Layername outputsize 18-layer 34-layer 50-layer 101-layer 152-layer
Conv1 112x112 7 x, stride 2
Conv2_x 56x56 3x3 max pool, stride 2
3x3, 64x2 3x3, 64x3 1x1, 64x3 1x3 1x3
3x3, 64 3x3x3, 64 3x3, 64 3x3, 64
1x1,256 1x1,256 1x1,256
Conv3_x 28x28 3x3,128x2 3x3,128x4 1x1,128x4 1x1,128x4 1x1,128x8
3x3,128 3x3,128 3x3,128 3x3,128 3x3,128
1x1,512 1x1,512 1x1,512
Conv4_x 14x14 3x3,256x2 3x3,256x6 1x1,256x6 1x1,256x23 1x1,256x36
3x3,256 3x3,256 3x3,256 3x3,256 3x3,256
1x1x1x1 x
Conv5_x 7x7 3x3,512x2 3x3,512x3 1x1,512x3 1x1,512x3 1x1,512x3
3x3,512 3x3,512 3x3,512 3x3,512 3x3,512
1x1x1x1 x
1x1 average pool, 1000-d fc, softmax
FLOPs 1.8x10 ^ 9 3.6x10 ^ 9 3.8x10 ^ 9 7.6x10 ^ 9 11.3x10 ^ 9

The ResNet structure eliminates the increasing number of layers that keep increasing the training set error. The training error of the ResNet network gradually decreases as the number of layers increases, and the test set performance improves. Google draws on ResNet to propose Inception V4 and Inception-ResNet-V2. ILSVRC has an error rate of 3.08%. "Identyty Mappings in Deep Residual Networks" proposes ResNet V2. ResNet residual Learning Unit propagation formula. Feed-forward information and feedback signals can be directly transmitted. The non-linear activation function of skip connection, replacing Identity Mappings (y = x ). Each layer of ResNet uses Batch Normalization.

Professor mongodhuber, ResNet, without the gates LSTM network, the input x has been transmitted to the back layer. ResNet is equivalent to RNN. ResNet is similar to ensemble ).

The Power of Depth for Feedforward Neural Networks proves that deepening The network is more effective than widening The network.

Tensorflow implements ResNet. Contrib. slim library, native collections. Collections. namedtuple: the Basic Block module group named tuple of ResNet is designed to create a Block class. Only the data structure is available and there is no specific method. A typical Block has three parameters: scope, unit_fn, and args.
Block ('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]), block1 is the Block name (or scope ), bottleneck is the residual Learning Unit of ResNet V2. The final parameter is Block args, and args is a list. Each element corresponds to a bottleneck residual Learning Unit. The first two elements (256, 64, 1), the third element (256, 64, 2), each element is a three-element tuple (depth, depth_bottleneck, stride ). (256, 64, 3) represents the bottleneck residual Learning Unit (three convolution layers), the number of output channels on the third layer is depth 256, the number of output channels on the first two layers is depth_bottleneck 64, and the middle layer step is stride 3. Residual Learning Unit structure [(1x1/s1, 64), (3x3/s3, 64), (1x1/s1, 256)].

Subsample Method for downsample. The parameters include inputs (input), factor (sampling factor), and scope. Fator1: inputsx is returned without modification. If it is not set to 1, it is implemented using the maximum pool of slim. max_pool2d. 1x1 pooled size and stride step size for downsampling.

Define the conv2d_same function to create a convolution layer. If stride is 1, use slim. conv2d and padding mode SAME. Stride is not 1, and the explicit pad zero. Pad zero total kernel_size-1 pad_beg for pad // 2, pad_end for the remaining part. Tf. pad zero input variable. Zero padding already exists. You only need to create this convolution layer in slim. conv2d of VALID in padding mode.

Defines the stack Blocks function. The net input parameter. blocks is the Block class list. Outputs_collections collect each end_points collections. Two layers of loops, Block one by one, and stack Residual Unit one by one. Use two tf. variable_scope names to name the residual Learning Unit block/unit_1 form. Layer-7 loops, each Block has each Residual Unit args, expanding depth, depth_bottleneck, and stride. Unit_fn residual Learning Unit generation function, which sequentially creates connections to all residual learning units. Slim. utils. collect_named_outputs function, and the output net is added to the collection. After all the Block's Residual units are stacked, the final net result is returned as the stack_blocks_dense function.

Create ResNet generic arg_scope and define the default value of function parameters. Defines the default value of the Training flag is_training, which is True, and the weight attenuation speed is weight_decay. The default value is 0.001. BN attenuation rate is 0.997 by default, BN epsilon is 1e-5 by default, and BN scale is True by default. First, set the parameters of BN through slim. arg_scope sets slim. conv2d default parameter, weight RegEx set L2 regular, weight initialized set slim. variance_scaling_initializer (), the activation function is set to ReLU, and the standardization tool is set to BN. The maximum pooled padding mode defaults to SAME (VALID is used in this paper), which makes feature alignment easier. Multi-layer nested arg_scope is returned as a result.

Defines the core bottleneck residual Learning Unit. ResNet V2 Full Preactivation Residual Unit variant. Batch Normalization is used before each layer, and preactivation is input. The function is not activated by convolution. Parameter, inputs input, depth, depth_bottleneck, stride, outputs_collections collection end_points collection, scope is the name of the unit. Use the slim. utils. last_dimension function to obtain the number of output channels in the last dimension of the input. The min_rank parameter = 4 must have at least four dimensions. Slim. batch_norm inputs Batch Normalization and uses the ReLU function to pre-activate Preactivate.

Defines shorcut and connects directly to x. If the input number of depth_in and output number of depth are consistent in the residual unit, subsample, step stride, and inputs are used for downsampling to ensure that the space size and residual size are consistent, residual intermediate layer convolution step stride; if they are inconsistent, use the step stride 1 X1 convolution to change the number of channels and change to the same.

Define residual (residual), Layer 3, 1X1 size, step 1, number of output channels depth_bottleneck convolution, 3x3 size, step stride, number of output channels depth_bottleneck convolution, 1x1 size, step 1, number of output channels depth convolution, the final residual is obtained, and the last layer does not have a regular entry and no function is activated. Add residual and shorcut to get the final output result. Use slim. utils. collect_named_outputs to add the collection result and return the output function result.

Define the generate main function of ResNet V2. Parameter, inputs input, blocks is the Block class list, num_classes last output class number, global_pool indicates whether to add the last layer of global average pooling, the include_root_block flag indicates whether to add the first 7x7 convolution and largest pooling of the ResNet network, and whether to reuse the reuse mark. The name of the entire scope network. Define variable_scope and end_points_collection. Use slim. arg_scope to set the parameter outputs_collections of the slim. con2d, bottleneck, and stack_block_dense functions to the default end_points_colletion. Based on the include_root_block tag, create a 7x7 convolution of the 64-output channel Step 2 at the top of ResNet, and connect the 3x3 of step 2 to the largest pool. The two steps are two layers, and the image size is reduced to 1/4. Use stack_blocks_dense to generate a residual learning module group, add a global average pooling Layer Based on tags, and use tf. performance_mean to achieve global average pooling, which is more efficient than avg_pool. Add the output channel num_classes1x1 convolution (no activation function, no regularizedentry) based on the number of categories, and add the Softmax layer to output the network results. Use slim. utils. convert_to_dict to convert collection to Python dict. Finally, net and end_points are returned.

50-layer ResNet, 4 residual learning Blocks, the number of units is 3, 4, 6, 3, the total number of layers (3 + 4 + 6 + 3) x3 + 2 = 50. Before the residual learning module, convolution and pooling reduce the size by 4 times. The first three Blocks contains two layers of step size, and the total size is reduced by 4x8 = 32 times. The size of the input image is changed to 224/32 = 7. ResNet uses Step 2 to reduce the size, and the number of output channels increases continuously to 2048.

Layer-3 ResNet: the number of units in the second Block is 8, and the number of units in the third Block is 36.

The number of units in the second Block is 23 in the ResNet layer and 36 in the third Block.

Evaluate the function time_tensorflow_run to test the performance of the 152 layer ResNet forward. The image size is 224x224, And the batch size is 32. Is_training FLAG is set to False. Resnet_v2_152 creates a network, and time_tensorflow_run evaluates the forward performance. The time consumption is increased by 50%. The convolutional neural network structure supports ultra-deep network training, and the forward performance of industrial applications is not bad.

References:
TensorFlow practice

Welcome to paid consultation (150 RMB per hour), My: qingxingfengzi

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Learning notes TF033: Implement ResNet and tf033resnet

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support