Non-local Algorithm Code parsing

Source: Internet
Author: User

Paper: Non-local Neural Networks for Video classification
Paper Link: https://arxiv.org/abs/1711.07971
Code Link: https://github.com/facebookresearch/video-nonlocal-net

The official code is based on the CAFFE2 implementation, this blog introduces the project's main code, through the code to deepen the understanding of the algorithm.
Suppose ~video-nonlocal-net is a project directory pulled down from Https://github.com/facebookresearch/video-nonlocal-net. Because the code is based on the video classification as an example, so the network structure and image classification used by the ResNet in the dimension of the difference, but the overall network structure is the same as ResNet contains conv1 to conv5_x and some pooled and fully connected layer, for ResNet-50, Conv2 The number of blocks for _x, conv3_x, conv4_x, conv5_x, respectively, is 3,4,6,3, and ResNet-101 for 3,4,23,3, both of which are the main examples of this code.

Training startup scripts are all in the ~video-nonlocal-net/scripts/directory, and you can see from the training script that Non-local's operations are only introduced in conv3_x and conv4_x two stages, conv2_x and conv5_ X These two stages do not contain non-local operations. In the multiple block structures of a stage, blocks containing non-local operations and blocks that do not contain non-local operations are superimposed alternately, which are described in detail later.
Configuration related files directory: ~video-nonlocal-net/lib/core/config.py

The main feature of the article is the modification of the network structure, and the construction of the network structure is ~video-nonlocal-net/lib/models/resnet_video.py the Create_model function in the script. So start looking at this function.

Def create_model (): ... # resnet the first convolutional layer of the network, which is conv1 Conv_blob = model.
        CONVND (data, ' Conv1 ', 3, [1 + use_temp_convs_set[0][0] * 2, 7, 7], strides=[temp_strides_set[0][0], 2, 2], Pads=[use_temp_convs_set[0][0], 3, 3] * 2, weight_init= (' Msrafill ', {}), bias_init= (' Constantfill ', {' value ': 0.}, No_bias=1) # bn layer, Relu layer, and pooling layer Bn_blob = model. SPATIALBN (Conv_blob, ' res_conv1_bn ', epsilon=cfg. Model.bn_epsilon, Momentum=cfg. Model.bn_momentum, Is_test=test_mode,) Relu_blob = model. Relu (Bn_blob, bn_blob) Max_pool = model. Maxpool (Relu_blob, ' pool1 ', Kernels=[1, 3, 3], strides=[1, 2, 2], pads=[0, 0, 0] * 2) # conv2_x. All blocks of the stage do not introduce the non local operation, which is implemented by the Res_stage_nonlocal function of # in the resnet_helper.py script.
The group is configured by default to 1, which is the general convolution operation. blob_in, dim_in = resnet_helper.res_stage_nonlocal (model, Res_block, Max_pool, up, down, stride=1, num_blocks= N1, prefix= ' Res2 ', Dim_inner=dim_inner, Group=group, use_temp_CONVS=USE_TEMP_CONVS_SET[1], temp_strides=temp_strides_set[1]) blob_in = model. Maxpool (blob_in, ' pool2 ', kernels=[2, 1, 1], strides=[2, 1, 1], pads=[0, 0, 0] * 2) # conv3_x. The stage has a non local operation.
Several important parameters: Nonloca_mod is 2 for ResNet-50 and # ResNet-101, which determines which blocks in the stage to introduce non local operations. dim_in = resnet_helper.res_stage_nonlocal (model, Res_block, blob_in, dim_in, blob_in, stride=2, num_ BLOCKS=N2, prefix= ' Res3 ', Dim_inner=dim_inner * 2, Group=group, Use_temp_convs=use_temp_co NVS_SET[2], temp_strides=temp_strides_set[2], batch_size=batch_size, nonlocal_name= ' nonlocal_conv3 ', Nonlo CAL_MOD=LAYER_MOD) # conv4_x. The stage has a non local operation. Several important parameters: Nonloca_mod for ResNet-50 # is 2, for ResNet-101 is 7, this is mainly because ResNet-50 conv4_x has 6 blocks, # and ResNet-101 conv4_
X has 23 lock. blob_in, dim_in = resnet_helper.res_stage_nonlocal (model, Res_block, blob_in, dim_in, 1024x768, stride=2, Num_blo Cks=n3, prefix= ' res4 ', Dim_inner=dim_inner * 4, GROup=group, Use_temp_convs=use_temp_convs_set[3], temp_strides=temp_strides_set[3], BATCH_SIZE=BATC H_size, nonlocal_name= ' nonlocal_conv4 ', nonlocal_mod=layer_mod) # conv5_x.
All blocks of the stage do not introduce non local operations. blob_in, dim_in = resnet_helper.res_stage_nonlocal (model, Res_block, blob_in, dim_in, 2048, stride=2, Num_blo  Cks=n4, prefix= ' res5 ', Dim_inner=dim_inner * 8, Group=group, Use_temp_convs=use_temp_convs_set[4], TEMP_STRIDES=TEMP_STRIDES_SET[4]) # The last pooled layer blob_out = model. Averagepool (blob_in, ' pool5 ', Kernels=[pool_stride, 7, 7], strides=[1, 1, 1], pads=[0, 0, 0] * 2) # Final full-connection layer blob_out = mo Del. FC (blob_out, ' pred ', dim_in, CFG. MODEL. Num_classes, weight_init= (' Gaussianfill ', {' std ': cfg. MODEL. FC_INIT_STD}), bias_init= (' Constantfill ', {' value ': 0.}) # The last Softmax loss function layer Softmax, loss = model. Softmaxwithloss ([Blob_out, Labels], [' Softmax ', ' loss '], Scale=scale)

The above is the macro structure of the entire network, the focus of course is the construction of 4 stage structure. It can be seen that the 4 stages are constructed by the res_stage_nonlocal function in the resnet_helper.py script, except that the non local related parameters are required to be introduced when non local operation is to be brought in. So the res_stage_nonlocal function in the resnet_helper.py script involves the details of the non local Operation , followed by a look.

directory where the script is located: ~video-nonlocal-net/lib/models/resnet_helper.py

Def res_stage_nonlocal (): ... # The main code of the function is as follows.
One loop of this large for loop is to build a block. # mainly consists of two parts, one is the _GENERIC_RESIDUAL_BLOCK_3D function, # This function is a regular block structure construction code, a block mainly contains 3 convolutional layers and # A residual connection, no longer detailed. While the non local operation is implemented by the content of the IF condition of the following #, you can see the condition if idx% Nonlocal_mod = = nonlocal_mod-1 # statement, conv3_x for ResNet-50 conv4_x and Nonlo Cal_mod are 2, # So when num_blocks==4 (such as ResNet-50 's conv3_x), # idx==1 or 3 o'clock will perform non local operations, and non local operation will get the output blob_in # Blob_in that are covered by the regular block construction. So the effect is that every other block will use the non local square # type to make the block structure.
It is emphasized that the last parameter of the add_nonlocal function is int (DIM_IN/2), which is half the number of input channel, and the channel number relation in Figure2 is equal.  For IDX in range (num_blocks): Block_prefix = "{}_{}". Format (prefix, idx) block_stride = 2 if (idx = = 0 and Stride = = 2) Else 1 blob_in = _generic_residual_block_3d (model, blob_in, dim_in, Dim_out, Block_stri

        De, Block_prefix, Dim_inner, Group, Use_temp_convs[idx], temp_strides[idx]) dim_in = Dim_out If idx% Nonlocal_mod = = Nonlocal_mod-1: Blob_in = nonlocal_helper.add_nonlocal (model, blob_in, dim_in, dim_in, Batch_size, Nonlocal_n Ame + ' _{} '. Format (IDX), int (DIM_IN/2)) return blob_in, dim_in

Now we know which stage of the non local operation will be introduced in which block structure, but the specific operation of non local is not yet involved, the next is to uncover the truth. The last piece of code shows that the non local operation is done through the add_nonlocal function of the nonlocal_helper.py script.

directory where the script is located: ~video-nonlocal-net/lib/models/nonlocal_helper.py
The spacetime_nonlocal function code is as follows, this part is the core of the thesis.

def spacetime_nonlocal () ... # corresponds to the θ operation in the thesis Figure2, it is realized with the 1*1*1 convolution.
# The number of convolution cores here dim_inner is half of the dim_in of the input feature map number. theta = model.
        CONVND (cur, prefix + ' _theta ', dim_in, Dim_inner, [1, 1, 1], strides=[1, 1, 1], pads=[0, 0, 0] * 2, weight_init= (' Gaussianfill ', {' std ': cfg. Nonlocal. CONV_INIT_STD}), bias_init= (' Constantfill ', {' value ': 0.}), No_bias=cfg. Nonlocal.no_bias) # If there are pooled operations in the configuration, the effect is similar: (8, 1024x768, 4, +, +) = (8, 1024, 4, 7, 7), # This part is what the text says take the pooling way to realize Subsampl
ING, thus reducing the amount of computation. If Cfg. Nonlocal. Use_maxpool is True:max_pool = model. Maxpool (cur, prefix + ' _pool ', kernels=[1, Max_pool_stride, Max_pool_stride], strides =[1, Max_pool_stride, Max_pool_stride], pads=[0, 0, 0] * 2,) Else:max_pool = cur # correspondence paper
The φ operation in Figure2 is implemented with the 1*1*1 convolution. PHI = model. CONVND (max_pool, prefix + ' _phi ', dim_in, Dim_inner, [1, 1, 1], strides=[1, 1, 1], pads=[0, 0, 0] * 2, weight_init= (' Gaussianfill ', {' std '): CFG. Nonlocal. CONV_INIT_STD}), bias_init= (' Constantfill ', {' value ': 0.}), No_bias=cfg.
Nonlocal.no_bias) # Corresponds to the G operation in the paper Figure2, which is realized by the 1*1*1 convolution. g = model.
        CONVND (max_pool, prefix + ' _g ', dim_in, Dim_inner, [1, 1, 1], strides=[1, 1, 1], pads=[0, 0, 0] * 2, weight_init= (' Gaussianfill ', {' std ': cfg. Nonlocal. CONV_INIT_STD}), bias_init= (' Constantfill ', {' value ': 0.}), No_bias=cfg.
Nonlocal.no_bias) # Reshape operation is to combine t*h*w three dimensions into a thw dimension, because matrix multiplication is the next step. Theta, theta_shape_5d = model. Reshape (theta, [theta + ' _re ' if not cfg. MODEL. Allow_inplace_reshape Else Theta, theta + ' _shape5d '], shape= (Batch_size, Dim_inner,-1)) Phi, Phi_sha PE_5D = model. Reshape (PHI, [phi + ' _re ' if not cfg. MODEL. Allow_inplace_reshape Else Phi, Phi + ' _shape5d '], shape= (Batch_size, Dim_inner,-1)) G, g_shape_5d = Model. ReShape (g, [G + ' _re ' if not cfg. MODEL. Allow_inplace_reshape else G, G + ' _shape5d '], shape= (Batch_size, Dim_inner,-1)) #θ and φ output matrix multiplication, dimension change example Sub: (8, 1024x768, 784) * (8, 1024x768, 784) = (8, 784, 784) Theta_phi = Model.net.BatchMatMul ([Theta, phi], prefix + ' _affinity ') , trans_a=1) # Whether to use Softmax, corresponding to the Gaussian in the paper and embedded Gaussian two kinds f () function if cfg. Nonlocal. Use_softmax is true:if cfg. Nonlocal. Use_scale is TRUE:THETA_PHI_SC = model.
        Scale (Theta_phi, Theta_phi, scale=dim_inner**-.5) else:theta_phi_sc = theta_phi # Softmax # sum (P[i, J,:]) = = 1, for any I, J P = model. Softmax (THETA_PHI_SC, Theta_phi + ' _prob ', engine= ' CUDNN ', axis=2) else:ones = Model.net.ConstantFill ([Theta_phi]
        , [Theta_phi + ' _ones '], value=1.) ones = Model.net.ReduceBackSum ([ones], [Theta_phi + ' _const ']) zeros = Model.net.ConstantFill ([Theta_phi], [theta
        _phi + ' _zeros '], value=0.)
 Denom = MODEL.NET.ADD (           [Zeros, ones], [Theta_phi + ' _denom '], broadcast=1, axis=0) model. 
Stopgradient (Denom, denom) p = Model.net.Div ([Theta_phi, Denom], [Theta_phi + ' _SC ']) # matrix multiplication, this step is the paper in the Formula 1 corresponds to Y.
t = Model.net.BatchMatMul ([g, p], prefix + ' _y ', trans_b=1) # thw dimension reshape back to t*h*w, otherwise the subsequent convolution operation cannot be performed. T_re, T_shape = model. Reshape ([T, theta_shape_5d], [t + ' _re ' if not cfg. MODEL.
Allow_inplace_reshape else T, T + ' _shape3d ') # corresponds to the 1*1*1 convolution in the upper right corner of the paper, which is the w*y of Formula 6 in the paper. Blob_out = t_re Blob_out = model. CONVND (blob_out, prefix + ' _out ', Dim_inner, Dim_out, [1, 1, 1], strides=[1, 1, 1 ], pads=[0, 0, 0] * 2, weight_init= (' Gaussianfill ', {' std ': cfg. Nonlocal. CONV_INIT_STD}) if not cfg. Nonlocal. Use_zero_init_conv else (' Constantfill ', {' value ': 0.}), bias_init= (' Constantfill ', {' value ': 0.}), No_bia S=cfg. Nonlocal.no_bias) # Finally, if the configuration file has a BN layer configured, add a bn layer blob_out = model.
      SPATIALBN (      Blob_out, prefix + "_bn", Dim_out, Epsilon=cfg. Nonlocal.bn_epsilon, Momentum=cfg. Nonlocal.bn_momentum, Is_test=is_test)

At this point, the network structure construction code is finished.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.