Deformable convolutional Networks interpretation

Source: Internet
Author: User

This thesis is Daijifeng teacher another good article, consistent good idea, and realize very beautiful, arxiv link motivation

Objects in real-world images change a lot, and the network can only "Remember" these variants, such as N object scale, pose, viewpoint, and part deformation, with data enhancements. But this data enhancement can only rely on some prior knowledge, such as the reversal of the object category unchanged, but some changes are unknown and manual design is too inflexible, not easy generalization and migration. This article starts from the basic structure of CNN model, for example, the convolution sampling position is fixed, pool sampling location is fixed, ROI pool is also the ROI into a fixed space bins, which can not deal with geometric changes, there are some problems, High-level neurons, such as coded semantics or spatial information, do not want the same feeling field for each active cell element of the same layer. In the detection is to extract the characteristics of bbox, which is not good for non-lattice objects. Therefore, a deformable convolution neural network is proposed in this paper.

Example: 3x3 convolution or pool, the normal CNN network sampling fixed 9 points, and after the improvement, these nine sampling points can be distorted, special circumstances such as (c) is magnified (d) is the rotation of the
Implement General CNN

Taking the 3x3 convolution as an example
For each output y (p0), sample 9 locations from X, these 9 positions are in the center position x (p0) to diffuse around the gird shape, ( -1,-1) represents the upper-left corner of X (P0), (1,1) represents the lower right corner of X (P0), other similar.

can be deformed CNN

Also for each output y (p0), the 9 locations are sampled from X, where the 9 positions are spread around the central position x (P0), but a new parameter ∆PN allows the sampling point to spread to a non gird shape

Note that ∆PN is most likely a decimal, and feature map X is an integer position, which requires bilinear interpolation

This place requires not only the gradient of the inverse W (PN) x (p0 + pn +∆PN), but also the ∆PN gradient, which requires a careful introduction of bilinear interpolation linear interpolation

Known data (x0, y0) and (x1, y1), to compute the Y-value of X on a line (or x on a line y, a position in the [x0, X1] interval, similar)
Use the distance of X and x0,x1 as a weight for the weighted y0 and Y1
bilinear interpolation

Bilinear interpolation is essentially a linear interpolation in two directions.
The floating-point coordinates of X (p) are (I+U,J+V) (where I, j are all integral parts of floating-point coordinates, u, V is the decimal part of the floating-point coordinates, is the floating-point number of the value [0,1) interval, then the pixel value X (p) of this point (I+U,J+V) can be represented by X (Q1): (I,J), X (Q2 ): (I+1,j), X (Q3): (i,j+1), X (Q4): (i+1,j+1) The value of the surrounding four pixels is determined
1. First in the X direction to do linear interpolation is worth to T1 T2 Pixel value
2. Then do linear interpolation in the Y direction and finally get the pixel value of x (P)
Final formula: f (i+u,j+v) = (1-u) (1-v) F (i,j) + (1-u) VF (i,j+1) + u (1-v) f (i+1,j) + UVF (i+1,j+1) (i)

X (p) corresponding to the deformable convolution


G (A, B) = max (0, 1−|a−b|). Q is the adjacent 4 points, p0,pn,∆pn are two-dimensional coordinates, can be brought into the formula a

And then the derivation for the gradient

∂g (Q,P0+PN+∆PN)/∂∆PN can be derived from the formula one deformable convolution

This offset is by adding a convolution to the input feature, which is the same size as the dilation and the convolution of the volume.

The output offset fields have the same spatial resolution with the input feature map
I have an objection to this sentence, it should be the output of the feature map and the same size. There are 2*3*3 offset values for each point on each output feature map. Plus the last sentence the size of the volume and dilation and its own convolution is consistent with the output of the feature map size, which is also reflected in the code

Res5a_branch2a_relu = mx.symbol.Activation (name= ' Res5a_branch2a_relu ', data=scale5a_branch2a, act_type= ' Relu ')
# and Deformableconvolution convolution parameters are consistent 
# Num_filter=num_deformable_group * 2 * kernel_height * kernel_width 
# Num_deformable_group can be ignored, similar to the group convolution, so 72/4=18=2*3*3
res5a_branch2b_offset = mx.symbol.Convolution (name= ' Res5a_ Branch2b_offset ', data=res5a_branch2a_relu,num_filter=72, pad= (2, 2), Kernel= (3, 3), stride= (1, 1), dilate= (2, 2), CUDNN _off=true)

res5a_branch2b = mx.contrib.symbol.DeformableConvolution (name= ' res5a_branch2b '), Data=res5a_ Branch2a_relu, offset=res5a_branch2b_offset,num_filter=512, pad= (2, 2), Kernel= (3, 3), num_deformable_group=4, Stride = (1, 1), dilate= (2, 2), no_bias=true)

deformable RoI Pooling RoI Pooling

First, the ROI pooling (equation (5)) generates a pooled feature map. From the feature map, a FC layer produces a normalized offset δpˆijδp^ij, which is then converted to an offset Δpijδpij in equation (6) by multiplying the ROI by an element-by-case basis, such as: Δpij=γ⋅δpˆij∘ (w,h) Δpij=γ⋅δp^ij∘ (w , h). Here γγ is a predefined scalar to adjust the size of the offset. It is set up to be γ=0.1γ=0.1 in experience. In order to make the migration learning invariant to the ROI size, the migration normalization is necessary. This part is not too understanding.
position-sensitive (PS) RoI pooling

# with 1*1 convolution get offset 2k*k (c+1)
rfcn_cls_offset_t = Mx.sym.Convolution (Data=relu_new_1, kernel= (1, 1), num_filter=2 * 7 * 7 * num_classes, name= "rfcn_cls_offset_t")

rfcn_bbox_offset_t = Mx.sym.Convolution (Data=relu_new_1, kernel= (1, 1 ), num_filter=7 * 7 * 2, name= "rfcn_bbox_offset_t")

Reference:
Deformable_convolutional_networks_oral
[Image scaling--bilinear interpolation algorithm (http://blog.csdn.net/xiaqunfeng123/article/details/17362881)
30-minute Understanding: linear interpolation, bilinear interpolation bilinear interpolation algorithm
deformable convolutional Networks thesis Translation--A comparative study in Chinese and English
Code: Msracver/deformable-convnets

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.