Semantic segmentation--understanding convolution for Semantic segmentation

Source: Internet
Author: User

Understanding Convolution for Semantic segmentation
Https://arxiv.org/abs/1702.08502v1
Model Https://goo.gl/DQMeun

For semantic segmentation, we have improved from two aspects, one is dense upsampling convolution (DUC) instead of Bilinear upsampling, the other is hybrid dilated convolution (HDC) Instead of the traditional dilated convolution.

3.1. Dense upsampling convolution (DUC)
The input image was extracted by the CNN convolutional network model, and the dimension was reduced many times. But because the semantic segmentation needs to output the original size of the result image, so the general algorithm to the CNN model output feature layer bilinear upsampling or deconvolution to enlarge the feature map size, so that it and the input image size consistent.
There are some problems with this magnified link: Bilinear upsampling is isn't learnable and may lose fine details.
Deconvolution, in which zeros has to being padded in the unpooling step before the convolution operation

Here we present a small convolutional network module DUC for the purpose of magnifying the feature map dimensions.

The input to the DUC is the output of the ResNet network feature map HXWXC, we use DUC output feature map size
HXWX (R*RXL), finally reshaped the input image size HXWXL. Completed the magnification work
Where L is the total number of categories of semantic segmentation, R is the downsampling factor in ResNet.

The core idea of Duc is to divide the complete label map into R*r of the same size, and the size of each chunk is the feature map size of the input. In other words, we map the entire label map to a small, multi-channel label map. This mapping allows us to use the convolution operation directly from the input feature map to get the output label maps

Duc because it is available to learn, it captures some detail information.
Since DUC is learnable, it's capable of capturing and recovering fine-detailed information that's generally missing in t He bilinear interpolation operation.

Finally Duc is easy to embed into the FCN.

3.2. Hybrid dilated convolution (HDC)
In FCN we use dilated convolution mainly
Maintain high resolution of feature maps in FCN through replacing the max-pooling operation or strided convolution layer W Hile maintaining the receptive field of the corresponding layer.

Since all layers has equal dilation rates r
If all network layers use the same dilation rates r, this can cause a problem, as shown in the following figure:

Convolution sampling very sparse, partial information is incomplete, information is not relevant, information is inconsistent
1) Local information is completely missing; 2) The information can be irrelevant across large distances. Another outcome of the gridding effect is, pixels in nearby RXR regions @ Layer L receive information from completely Different set of "grids" which may impair the consistency of local information.

Here we present the hybrid dilated convolution (HDC)
We use a different dilation rate for each layer

4 Experiments and Results

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.