Refinenet:multi-path refinement Networks for high-resolution semantic segmentation interpretation

Source: Internet
Author: User

Code Open Source Https://github.com/guosheng/refinenet

Reference Blog

http://blog.csdn.net/melpancake/article/details/54143319

Http://blog.csdn.net/bea_tree/article/details/58208386

http://blog.csdn.net/zhangjunhit/article/details/72844862

The core innovation of this article is to design a similar pyramid image, from the original image of multiple scales to extract the characteristics of different scales, and finally through the design of multipath refinement of the different scales of feature map fusion, so that the coarse high-level semantic features and fine-grained low-level features. At the same time the realization of the network also uses the structure of the fast connection of the residual network, which contributes to the propagation of the network gradient and effectively trains the network.


The core of the article is "exploit multi-level features for high-resolution prediction with long-range residual connections."

1 Introduction
The problem of using CNN network model such as VGG and residual Net for semantic segmentation is that the feature map of CNN convolution pooling is 32 times times of drop sampling and many details are lost, which is too rough for the segmentation problem.
One of the solutions is to enlarge the feature graph up-sampling operation by learning deconvolution filters, which cause the segmentation precision is not high because the low-level visual features cannot be restored.
The literature "6" avoids dropping sampling by introducing atrous (or dilated) convolutions through larger receptive fields. This method is successful as soon as possible, but there are still two disadvantages: 1 because the volume kernel is larger, the area of the convolution is larger and the computational volume is relatively large. Then a large number of high-dimensional and high-resolution feature maps result in the need for larger GPU memory, especially in the case of training, which results in the output size of the input 1/8 2) Dil Ated convolutions introduced coarse sub-sampling of features, which could result in the loss of important information.
The literature "36" "22" uses intermediate layers to generate high-resolution segmentation results. Here we think that features from all levels is helpful for semantic segmentation. Here we propose a framework to integrate all the features for semantic segmentation.

Network Structure

The function of Refinenet block is to merge the feature map of different resolution level. The network structure is as follows:

The leftmost column is the encoder part of the FCN (the ResNet in the text), and the pretrained ResNet is divided into four feature resnet by the resolution of the blocks map. Then to the right four blocks respectively as 4 path through the Refinenet block Fusion refine, and finally a refined feature map (followed by Softmax bilinear interpolation output).
Note that in addition to RefineNet-4, all refinenet blocks are two input and are used to fuse different level refine, whereas the RefineNet-4 of single input can be seen as a task ResNet first for adaptation.

refinenet Block

Next take a closer look at Refinenet block, you can see that the main components are residual convolution unit, multi-resolution Fusion, Chained residual, Output Convolutions. Remember that this block function is a feature map that blends multiple level feature map output to a single level, but the implementation should be independent of the number of inputs, shape.
The residual convolution unit is the common residual unit that removes the BN; Multi-resolution Fusion is the feature map with multiple inputs, which is used by a convolution layer ( To the smallest feature map shape), and then to sample and do the Element-wise addition. Note that if a single input block like RefineNet-4 is directly pass, Chained residual pooling does not understand how the residual difference is added (Relu is very important), and then revise;

Bo Master Understanding can be seen and the traditional Encoder-decoder mode, where the decoder part directly take the encoder part of each layer feature map fusion, this form should not be called decoder, but the effect is really good, And the additional calculations introduced are not particularly numerous. The traditional Encoder-decoder framework of the semantic segmentation scheme generally outputs a contour that is not very good, requiring additional dense CRF, and it should be a trend to see that many high-resolution paper do not need additional CRF. Much of the analysis of the residual structure in this article is well worth learning. Output convolutions is to add a RCU before.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.