Convolution and inverse convolution

Source: Internet
Author: User
1. ForewordThe traditional CNN network can only give the image of the lable, however, in many cases, it is necessary to segment the identified objects to achieve end to end, and then FCN appeared, to the object segmentation provides a very important solution, the core is the convolution and deconvolution, so here is a detailed explanation of convolution and deconvolution. For 1-D convolution, the formula (discrete) and the computational process (continuous) are as follows: one of the functions (the original function or the convolution function) is flipped 180 degrees before the convolution.
Fig. 1 for discrete convolution, F is the size of the n1,g is N2, the size of the convolution is n1+n2-1 a simple introduction to one-dimensional convolution      In this part, we mainly look at the convolution from a mathematical point of view, that is, to see how the convolution is calculated. To illustrate, the volume integral is continuous convolution and discrete convolution. In order to facilitate the understanding of this, first of all, with discrete convolution examples, this article will no longer specifically discuss the continuous convolution (because the difference between continuous convolution and discrete convolution is the difference between continuous and discrete). When it comes to mathematics, the first is to give the formula definition of convolution operation: one dimensional convolution formula has three sequences (Y,h,u) in which the length of H lh=3,u is lu=6. Then there is the length of Y ly=lh+lu-1=8 (as to why it will be said later). How to calculate this convolution. We first move a sequence from small to large (U0,U1,U2,U3,U4,U5) on a one-dimensional line, because the number of u in the formula I is from small to large. The h sequence is arranged from large to small (h2,h1,h0) because the ordinal number of H is-I. Then align the beginning of the two sequence with the following figure: K=0
Once the k=7 is arranged, we can start the convolution operation. When the displacement is k=0, the following H sequence does not move, and the items that exist in the top and bottom two sequences are multiplied and added, i.e. y (0) =h0*u0. When the displacement is k=1, the h sequence below moves 1 bits, after which the corresponding items are multiplied and added to Y (1) =h1*u0+h0*u1. Move accordingly until the maximum value of K is stopped. It can be seen that the one-dimensional convolution is the convolution nucleus h, which is multiplied by the corresponding item after the moving of the one-dimensional line of the convolution signal U. So what is the value range of K? From the graph we can see that when the k=8, the H sequence and the U sequence corresponding items do not exist, then k=8 is meaningless. So the maximum length is to make the two sequences coincide with at least 1 items, that is, two sequence lengths and then minus the length of a coincident item. 2. Image convolutionFig. 2 Likewise, the convolution requires a 180 rotation of the convolution kernel, at the same time, the convolution core center is aligned with the image pixel to be computed, the output structure is a new pixel value of the center aligned Pixel, and the computed example shows the pixel value of the pixel in the upper-left corner (that is, the first row of the first column). Given a more intuitive example, from left to right, the original pixel passes through the convolution from 1 to 8. Figure 4
        through the sliding convolution core, you can get the entire picture convolution results, Figure 5         to here, can generally understand the image convolution. But we can see that through the image convolution, the new image is the same size, or smaller. Figure 2 After the calculation of the image size is unchanged, such as Figure 5 convolution after the image is smaller because the pixel is not convolution calculation. But the 1-dimensional convolution is not getting bigger. Explain it below.        in the Matlb 2-dimensional convolution is divided into 3 categories, 1.full   2.same   3. Valid        Fig 2 corresponds to the convolution is called same, Figure 5 corresponds to valid. So what is full? The following figure        full convolution Figure 6         Figure 6 Blue is the original image, White for the corresponding convolution added padding, usually all 0, green is the convolution after the picture. The convolution of Fig. 6 is the convolution from the lower right corner of the convolution core with the upper left corner of the picture, the sliding step is 1, and the central element of the convolution kernel corresponds to the pixel of the image after convolution. You can see the volume after the image is 4x4, than the original 2x2 large, we also remember 1-dimensional volume is n1+n2-1, where the original is 2x2, convolutional nuclear 3x3, convolution after the result is 4x4, and one-dimensional completely corresponding up. In fact, this is the complete convolution calculation, other than its small convolution results are omitted part of the pixel convolution.        here, we can sum up the calculation formula of the image size after full,same,valid three kinds of convolution:        1.full: Slide step is 1, picture size is n1xn1, volume kernel size is n2xn2, image size after convolution:          (n1+n2-1) x (n1+n2-1); 6,  sliding step size is 1, the picture is 2x2, the volume kernel size is 3x3, and the image size after convolution: 4x4       2.same:  sliding step is 1, picture size is n1xn1, volume kernel size is n2xn2, image size after convolution: n1xn1        3.valid:   Sliding step length is s, picture size is n1xn1, volume kernel size is n2xn2, image size after convolution:                 {(N1-N2) +1}x{(n1-n2) +1}; As shown in Figure 5, the sliding step length is 1, the picture size is 5x5, and the volume kernel size is 3x3, Image size after convolution: 3x3        3. Reverse convolution (after convolution, transfer convolution)The deconvolution mentioned here is very different from the deconvolution calculation of 1-D signal Processing, the FCN author is called Backwards convolution, and some people call deconvolution layer is a very unfortunate name and should Rather be called a transposed convolutional layer. We can tell that in CNN there are con layer and pool Layer,con layer to extract features of image convolution, pool layer to narrow the image by half screening important features for the classic image recognition CNN network, such as Image Net, The final output is 1x1x1000,1000 is the category type, 1x1 gets.         The FCN author, or later the person who is studying the end to ends, is using deconvolution for final 1x1 results (in fact FCN the final output of the author is not 1x1, which is one of 32 of the size of the picture, but does not affect the use of deconvolution).         The deconvolution of the image here is the same as the full convolution principle of fig. 6. Using this deconvolution method to make the image larger, the FCN author uses a variant of the deconvolution described here, so that the corresponding pixel value can be obtained and the image can be implemented with end to end. Figure 7 Here says another deconvolution approach, assuming that the original image is 3x3, first use the sample to make the image 7x7, you can see more than a lot of blank pixels. Using a 3x3 convolution check image for valid convolution with a sliding step length of 1, to get a 5x5 image, we know that using the sample to enlarge the picture, using deconvolution to fill the image content, so that the image content becomes rich, this is also a CNN output end to the result of a method. South Korean author Hyeonwoo Noh uses the VGG16 layer CNN network with the symmetrical 16-layer deconvolution and the upper sampling network to achieve the end-end output, the different layers on the sampling and deconvolution changes are as follows,
top and bottom samplingThe main purpose of reducing the image (or the lower sampling (subsampled) or descending sampling (downsampled)) is two: 1, so that the image conforms to the size of the display area, 2, generate the corresponding image thumbnail. The main purpose of an enlarged image (or a sample (upsampling) or image interpolation (interpolating) is to magnify the original image so that it can be displayed on a higher resolution display device. Scaling operations on an image does not bring more information about the image, so the quality of the image will inevitably be affected. However, there are indeed some scaling methods that can increase the image's information, making the scaled image quality more than the original artwork quality.
------------New deconvolution Process Interpretation-------------
After the explanation and derivation of the above, the convolution has a basic understanding, but in the image of the deconvolution what exactly is the case, may still not be able to very good understanding, so here to explain this process.
The deconvolution, which is currently used most, is 2, as described above.
Method 1:full convolution, the complete convolution can make the original definition field larger
Method 2: Record the pooling index, then enlarge the space, and then use the convolution to fill the image with the deconvolution process as follows,
Input: 2x2, convolution core: 4x4, Slide step: 3, Output: 7x7
The process of entering a 2x2 picture through a 4x4 volume kernel and then a deconvolution with a step size of 3
1. Enter the picture of each pixel for a full convolution, according to the full volume size of the calculation can know the volume of each pixel after the size of 1+4-1=4, that is, 4x4 size feature map, input 4 pixels so 4 4x4 feature map
2. Fusion (i.e. addition) of 4 feature graphs with a step of 3; For example, the Red feature map is still in the original input position (upper left corner), green or output in the original position (upper right corner), Step 3 refers to every 3 pixels for fusion, overlapping parts to add, The 1th row, column 4th, is added by the first row of column Fourth of the Red Special array and the first row of the Green feature chart, and so on.
It can be seen that the size of the deconvolution is determined by the volume kernel size and the sliding step, in is the input size, k is the volume kernel size, S is the sliding step, and out is the output size
Get out = (in-1) * s + k
The above process is, (2-1) * 3 + 4 = 7

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.