Understanding Convolution for Semantic segmentation
Https://arxiv.org/abs/1702.08502v1
Model Https://goo.gl/DQMeun
For semantic segmentation, we have improved from two aspects, one is dense upsampling convolution (DUC) instead of Bilinear upsampling, the other is hybrid dilated convolution (HDC) Instead of the traditional dilated convolution.
3.1. Dense upsampling convolution (DUC)
The input image was extracted by the CNN convolutional network model, and the dimension was reduced many times. But because the semantic segmentation needs to output the original size of the result image, so the general algorithm to the CNN model output feature layer bilinear upsampling or deconvolution to enlarge the feature map size, so that it and the input image size consistent.
There are some problems with this magnified link: Bilinear upsampling is isn't learnable and may lose fine details.
Deconvolution, in which zeros has to being padded in the unpooling step before the convolution operation
Here we present a small convolutional network module DUC for the purpose of magnifying the feature map dimensions.
The input to the DUC is the output of the ResNet network feature map HXWXC, we use DUC output feature map size
HXWX (R*RXL), finally reshaped the input image size HXWXL. Completed the magnification work
Where L is the total number of categories of semantic segmentation, R is the downsampling factor in ResNet.
The core idea of Duc is to divide the complete label map into R*r of the same size, and the size of each chunk is the feature map size of the input. In other words, we map the entire label map to a small, multi-channel label map. This mapping allows us to use the convolution operation directly from the input feature map to get the output label maps
Duc because it is available to learn, it captures some detail information.
Since DUC is learnable, it's capable of capturing and recovering fine-detailed information that's generally missing in t He bilinear interpolation operation.
Finally Duc is easy to embed into the FCN.
3.2. Hybrid dilated convolution (HDC)
In FCN we use dilated convolution mainly
Maintain high resolution of feature maps in FCN through replacing the max-pooling operation or strided convolution layer W Hile maintaining the receptive field of the corresponding layer.
Since all layers has equal dilation rates r
If all network layers use the same dilation rates r, this can cause a problem, as shown in the following figure:
Convolution sampling very sparse, partial information is incomplete, information is not relevant, information is inconsistent
1) Local information is completely missing; 2) The information can be irrelevant across large distances. Another outcome of the gridding effect is, pixels in nearby RXR regions @ Layer L receive information from completely Different set of "grids" which may impair the consistency of local information.
Here we present the hybrid dilated convolution (HDC)
We use a different dilation rate for each layer
4 Experiments and Results