One, the tf.nn.conv2d () function Python version definition:
tf.nn.conv2d (input, filter, strides, padding, use_cudnn_on_gpu=none, Name=none)
1. Parameter input:
An input image that needs to be convolution is a tensor,shape for [batch, In_height, In_width, in_channels], meaning [the number of pictures of a batch at training time, picture height, picture width, Number of image channels]. Note that this is a 4-D tensor, with data types of float32 and float64 one of them.
2. Parameter filter:
Equivalent to the convolution core in CNN, is a tensor,shape for [Filter_height, Filter_width, In_channels, out_channels], meaning [convolution core height, Convolution core width, number of image channels, convolution core number], the requirement type is the same as the first parameter input. Note that the third dimension of filter, In_channels, is the fourth dimension of parameter input.
3. Parameter strides:
To do convolution in the image of each dimension of the step, which is a 1-dimensional vector, the length of 4. Note: Be sure to ensure that strides[0] = strides[3] = 1, because the concept of step size is not required in batch and in_channels dimensions. And, in most cases, the step in the horizontal and vertical directions of the image is the same, i.e. strides = [1,stride,stride,1].
4. Parameter padding:
The amount of string type can only be "Same", "VALID" one of them, this value determines the different convolution mode.
When padding = same, the size (width and height) of the convolution output image is the same as the size of the input image.
Example: Add input image as input = [1, 3, 3, 1], as shown below:
convolution kernel filter = [2, 2, 1, 1], as shown below:
When padding = same, the function will first fill the input image with 0, as shown below:
Then the convolution (weighted sum of each position) is started, and the final result is as follows:
As you can see, both the input image and the output image are 3*3 images.
When padding = valid, the function does not populate the image with a 0 operation, that is, without padding. At this point, the output image size is less than the size of the input image.
5. Parameter Use_cudnn_on_gpu:
BOOL type, whether to use CUDNN acceleration, which is trueby default.
6. Return value:
The function returns a tensor, which is what we often call feature map.
Besides strides:
is not as long as padding= ' same ', then the output size after convolution is the same as the input. In fact, this is also related to the strides step length. When the step strides = 1 o'clock has no effect on the result. But not 1 o'clock, the output size will no longer be the same as the input.
Import TensorFlow as TF
DATA=TF. Variable (Tf.random_normal ([64,48,48,3]), Dtype=tf.float32)
WEIGHT=TF. Variable (Tf.random_normal ([5,5,3,64]), Dtype=tf.float32)
SESS=TF. InteractiveSession ()
Tf.global_variables_initializer (). Run ()
conv1=tf.nn.conv2d (data,weight,strides=[ 1,1,1,1],padding= ' same ')
conv2=tf.nn.conv2d (data,weight,strides=[1,2,2,1],padding= ' SAME ')
conv3= tf.nn.conv2d (data,weight,strides=[1,4,4,1],padding= ' same ')
print (CONV1) print (
conv2)
print ( CONV3)
The result is:
Tensor ("conv2d_6:0", shape= (up, up, up), Dtype=float32)
Tensor ("conv2d_7:0", shape= (+,,), dtype= float32)
Tensor ("conv2d_8:0", shape= (+, +, +), Dtype=float32)
It can be seen that there is a multiple relationship between the size of the output size and the stride of the convolution.
Second, the realization of the function of the process
For a given shape for [batch, In_height, In_width, In_channels], the tensor variable input, and the shape for [Filter_height, Filter_width, In_channels, Out_ Channels] The convolution kernel filter, the function tensorflow::ops::conv2d (c + + version definition, and the Python version for tf.nn.conv2d are corresponding) roughly perform the following steps:
Step1: Convert convolution core into shape for [filter_height * filter_width * in_channels, Output_channels] tensor
STEP2: Convert input data to shape for [batch, Out_height, Out_width, Filter_height * filter_width * In_channels] tensor
Step3: Execute as follows:
Output[b, I, j, K] =
Sum_{di, DJ, q} input[b, strides[1] * i + di, strides[2] * j + DJ, Q] *
Filter[di, DJ, Q, K]