Using TensorFlow to implement convolution and deconvolution detailed process, interview Python to achieve convolution operation

Source: Internet
Author: User
convolution Operations
tf.nn.conv2d (input, filter, strides, padding, use_cudnn_on_gpu=none, Name=none)

The name parameter is dropped to specify the name of the operation, with a total of five parameters related to the method: input:
An input image that requires a convolution, which requires a tensor, with a shape such as [batch, In_height, In_width, In_channels] , which means [the number of images of a batch during training, Picture height, picture width, image channel number] Note that this is a 4-dimensional tensor that requires a type of float32 and float64; filter:
Equivalent to the convolutional kernel in CNN, it requires a tensor, with a shape such as [Filter_height, Filter_width, In_channels, out_channels] , meaning [ The height of the convolution nucleus, the width of the convolution nucleus, the number of image channels,the number of convolution cores, the requirement type is the same as the parameter input, there is a place to pay attention to, the third dimension In_channels is the fourth dimension of parameter input; strides:
Convolution in the image of each dimension of the step, which is a one-dimensional vector, length 4 padding: string type of quantity, can only be "SAME", "VALID" one of them, this value determines the different convolution mode Use_ CUDNN_ON_GPU:
BOOL type, whether to use CUDNN acceleration, default to True

The result returns a tensor, the output, which is what we often call the feature map implementation

So how does the TensorFlow convolution work, with some examples to explain it:

1. Considering one of the simplest cases, there is now a 3x3 single channel image (corresponding shape:[1,3,3,1]), with a 1x1 convolution nucleus (corresponding shape:[1,1,1,1)) to do convolution, and finally get a 3x3 feature map;

2. Increase the number of channels of the picture, using a 3x3 five-channel image (corresponding to shape:[1,3,3,5]), with a 1x1 convolution nucleus (corresponding shape:[1,1,1,1)) to do convolution, is still a 3x3 feature map, This is equivalent to each pixel point, the convolution nucleus and each channel of the pixel dot product;

input = tf. Variable (Tf.random_normal ([1,3,3,5]))
filter = tf. Variable (Tf.random_normal ([1,1,5,1]))

op = tf.nn.conv2d (input, filter, strides=[1, 1, 1, 1], padding= ' VALID ')

3. Expand the volume core, now use the 3x3 convolution kernel, the final output is a value, equivalent to 2 of the feature map of the value of all pixels sum

input = tf. Variable (Tf.random_normal ([1,3,3,5]))
filter = tf. Variable (Tf.random_normal ([3,3,5,1]))

op = tf.nn.conv2d (input, filter, strides=[1, 1, 1, 1], padding= ' VALID ')

4. Use larger picture to enlarge picture of situation 2 to 5x5, still be 3x3 convolutional kernel, make step 1, output 3x3 feature map

.....
. xxx.
. xxx.
. xxx.
.....

5. Above we have been the value of the parameter padding is ' VALID ' (not populated), when it is ' SAME ' (about two ways, the reference link), indicating that the convolution kernel can stay at the edge of the image, such as the output 5x5 of the feature map

input = tf. Variable (Tf.random_normal ([1,5,5,5]))
filter = tf. Variable (Tf.random_normal ([3,3,5,1]))

op = tf.nn.conv2d (input, filter, strides=[1, 1, 1, 1], padding= ' SAME ')

6. If the convolution nucleus has multiple

input = tf. Variable (Tf.random_normal ([1,5,5,5]))
filter = tf. Variable (Tf.random_normal ([3,3,5,7]))

op = tf.nn.conv2d (input, filter, strides=[1, 1, 1, 1], padding= ' SAME ')

Output 7 5x5 feature map at this time

7. step size is not 1, the document said for the picture, because only two dimensions, usually strides [1,stride,stride,1]

input = tf. Variable (Tf.random_normal ([1,5,5,5]))
filter = tf. Variable (Tf.random_normal ([3,3,5,7]))

op = tf.nn.conv2d (input, filter, strides=[1, 2, 2, 1], padding= ' SAME ')

8. If the batch value is not 1, enter 10 graphs at the same time

input = tf. Variable (Tf.random_normal ([10,5,5,5]))
filter = tf. Variable (Tf.random_normal ([3,3,5,7]))

op = tf.nn.conv2d (input, filter, strides=[1, 2, 2, 1], padding= ' SAME ')

Each picture has 7 3x3 feature maps, and the output shape is [10,3,3,7] deconvolution.

Conv2d_transpose (value, filter, Output_shape, strides, padding= "SAME", data_format= "NHWC", 
Name=none)

The name parameter is dropped to specify the name of the operation, with a total of six parameters related to the method: the first parameter value:
Refers to an input image that needs to be deconvolution, which requires a tensor second parameter filter:
The convolution nucleus, which requires a tensor, has a shape such as [Filter_height, Filter_width, Out_channels, In_channels] , which means [the height of the convolution nucleus, Volume kernel width, convolution kernel number, image channel number] The third parameter output_shape:
Deconvolution operation output shape, careful classmate will find that convolution operation is not this parameter, then this parameter is what use here. The fourth parameter strides of this question is explained below:
Deconvolution is the step in each dimension of the image, which is a one-dimensional vector, length 4, the fifth parameter padding:
The amount of string type, can only be "SAME", "VALID" one of them, this value determines the different convolution mode of the sixth parameter Data_format:
The amount of string type, one of ' NHWC ' and ' NCHW ', which is a new parameter in the new version of TensorFlow, which describes the data format of the value parameter. ' NHWC ' refers to the TensorFlow standard data format [batch, height, width, in_channels],' nchw ' refers to the data format of the Theano, [batch, In_channels, Height, Width], of course, the default value is ' NHWC '

First, define a single channel graph and 3 convolution cores.

X1 = Tf.constant (1.0, shape=[1,3,3,1])  
kernel = tf.constant (1.0, shape=[3,3,3,1])  

Define a few more graphs

x2 = tf.constant (1.0, shape=[1,6,6,3])  
x3 = tf.constant (1.0, shape=[1,5,5,3])  

X2 is the 6x6 3-channel graph, X3 is the 5x5 3-channel graph
Okay, let's do a convolution operation on X3.

y2 = tf.nn.conv2d (x3, kernel, strides=[1,2,2,1), padding= "SAME")  

So the return y2 is a single channel diagram, and if you understand the convolution process, it's easy to see that the results of the tensor,y2 of Y2 [1,3,3,1] are as follows:

[[[[[[]]
   [[A]]

  [[[
   ]] [A.] [A.]
   ]

  [[[A] [[]]
   [12.]]]]

Another important part of it. The filter parameters in the tf.nn.conv2d are in the form of [Filter_height, Filter_width, In_channels, Out_channels], and tf.nn.conv2d_ The filter parameters in the transpose are in the form of [Filter_height, Filter_width, Out_channels,in_channels], noting that in_channels and Out_channels are reversed. Because the two are reversed, the input and output are reversed. the deconvolution kernel is the transpose matrix of the original convolution kernel, so the deconvolution is also called the transpose convolution.

Since Y2 is the return value of the convolution operation, then of course we can do deconvolution, and the tensor returned by the deconvolution operation should be the same as the X3 shape (not difficult to understand, because it is the reverse process of convolution).

Y3 = Tf.nn.conv2d_transpose (y2,kernel,output_shape=[1,5,5,3], 
strides=[1,2,2,1],padding= "SAME")  

OK, now the return of the y3 is indeed [1,5,5,3] tensor, the result is as follows:

[[[[A]  .  .]  
   [  .  ]  
   [A.  [.]  .  ]  
   [A.  .]]  

  [[  .  ]  
   [A  .  [A.]  .]  
   [A  .  [.]  ]  ]  

  [[A  .  [A.]  .]  
   [A  .  .]  
   [  .]  
   [A.  ]  

  [[.  ]] [ .  ]  
   [A  .  [A.]  .]  
   [A  .  [.]  ]  ]  

  [[A]  .  .]  
   [  .  ]  
   [A.  [.]  .  ]  
   [A.  12.]]]  

How the result came about. You can use a moving diagram to illustrate

And AS

It seems that Tf.nn.conv2d_transpose's output_shape seems superfluous, because the original image, convolution core, the step is obviously can be exported to the size of the picture, then why to specify the Output_shape it.
Look at a situation like this:

Y4 = tf.nn.conv2d (x2, Kernel, strides=[1,2,2,1), padding= "SAME")  

We also do the X2 on the top and get the y4 with shape [1,3,3,1] as follows:

[[[[]]  
   [A.]]  

  [[  
   A.] [A.] [A.]  
   ]  

  [[[A.]  
   [  
   A.] [12.]]]]  

The [1,6,6,3] and [1,5,5,3] graphs get the same size through the convolution, [1,3,3,1]
So let's look back and see what the [1,3,3,1] graph does after deconvolution. Two situations have been created. Therefore, it is meaningful to specify output_shape here, of course, the optional output_shape is not allowed, the following situation will be an error:

Y5 = Tf.nn.conv2d_transpose (x1,kernel,output_shape=[1,10,10,3],
strides=[1,2,2,1],padding= "SAME")  

List of programs:

Import TensorFlow as tf  

x1 = tf.constant (1.0, shape=[1,3,3,1])  

x2 = tf.constant (1.0, shape=[1,6,6,3])  

x3 = tf . Constant (1.0, shape=[1,5,5,3])  

kernel = tf.constant (1.0, shape=[3,3,3,1])  



y1 = tf.nn.conv2d_transpose (x1, kernel,output_shape=[1,6,6,3],  
    strides=[1,2,2,1],padding= "SAME")  

y2 = tf.nn.conv2d (x3, kernel, strides=[ 1,2,2,1], padding= "SAME")  

y3 = Tf.nn.conv2d_transpose (y2,kernel,output_shape=[1,5,5,3],  
    strides=[1,2,2,1] , padding= "SAME")  

y4 = tf.nn.conv2d (x2, Kernel, strides=[1,2,2,1], padding= "SAME")  

' 
wrong!! This is impossible 
Y5 = Tf.nn.conv2d_transpose (x1,kernel,output_shape=[1,10,10,3),
strides=[1,2,2,1], padding= "SAME") 
'  
sess = tf. Session ()  
Tf.global_variables_initializer (). Run (session=sess)  
X1_decov, X3_cov, Y2_decov, x2_cov= Sess.run ([Y1,y2,y3,y4]) print (  
x1_decov.shape) print (  
x3_cov.shape)  
print (Y2_decov.shape)  
Print (X2_cov.shape)  
python implementation of convolution operations
Import NumPy as NP input_data=[[[1,0,1,2,1], [0,2,1,0,1], [1,1,0,2,0],
                [2,2,1,1,0], [2,0,1,2,0]], [[2,0,2,1,1], [0,1,0,0,2], [1,0,0,2,1], [1,1,2,1,0], [1,0,1,1,1]]] weights_data=[[ [1, 0, 1], [-1, 1, 0], [0,-1, 0]], [[-1, 0, 1], [0, 0, 1
    ], [1, 1, 1]]] #fm: [h,w] #kernel: [k,k] #return rs:[h,w] def compute_conv (Fm,kernel):
    [H,w]=fm.shape [K,_]=kernel.shape r=int (K/2) #定义边界填充0后的map Padding_fm=np.zeros ([H+2,w+2],np.float32) #保存计算结果 Rs=np.zeros ([H,w],np.float32) #将输入在指定该区域赋值, that is, except for the 4 boundaries, the remaining area PADDING_FM[1:H+1,1:W+1]=FM #对每个点为中心的区域遍 Calendar for I in range (1,h+1): for J in Range (1,w+1): #取出当前点为中心的k *k Area roi=padding_fm[i-r:i+
         R+1,J-R:J+R+1]   #计算当前点的卷积, multiply Rs[i-1][j-1]=np.sum (Roi*kernel) return Rs def my_conv2d (input,weights) on k*k dots: [C, H,w]=input.shape [_,k,_]=weights.shape Outputs=np.zeros ([h,w],np.float32) #对每个feature map traversal to each feature map into Row convolution for I in range (c): #feature map==>[h,w] f_map=input[i] #kernel ==>[k,k] W=we Ights[i] Rs =compute_conv (f_map,w) Outputs=outputs+rs return outputs def main (): #shape =[c , h,w] input = Np.asarray (Input_data,np.float32) #shape =[in_c,k,k] weights = Np.asarray (weights_data,np.float3
 2) rs=my_conv2d (input,weights) print (RS) if __name__== ' __main__ ': Main ()
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.