Paper Notes "Fully convolutional Networks for Semantic Segmentation"

Last Update:2015-08-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

"Fully convolutional Networks for Semantic segmentation", CVPR best paper,pixel level, Fully supervised.

The main idea is to change CNN to FCN, input an image directly on the output to get dense prediction, that is, each pixel belongs to the class, thus obtaining a end-to-end method to achieve image semantic segmentation.

We already have a CNN model, first of all connected to CNN as a convolution layer, convolution template size is the size of the input feature map, that is, the whole connected network as the entire input map to do convolution, the entire connection layer has 4,096 6*6 convolution core, 4,096 1*1 convolution core, Convolutional cores of 1000 1*1, such as:

Next to the output of these 1000 1*1, do upsampling, get 1000 original size (such as 32*32) output, these outputs merged, get the heatmap shown.

Dense prediction is obtained through upsampling, the author has studied 3 kinds of programs:

1,shift-and-stitch: The drop-down sampling factor between the set original and the FCN output is F, so for each f*f area of the original image (not overlapping), "shift the input x pixels to the right and y pixels down fo R every (x, y), 0 < X, y < F. "The output of this f*f region corresponds to the output of the center point pixel at this time, so that each f*f region is f^2 output, that is, each pixel can correspond to one output , so became the dense prediction.

2,filter rarefaction: Just zoom in on the size of the filter in the subsampling layer of the CNN Network and get a new filter:

where S is the sliding step of the subsampling, the new filter's sliding step is set to 1, so that the subsampling does not shrink the image size, finally can get dense prediction.

Neither of these methods has been used by the authors, mainly because the two methods are trade-off, because:

For the second method, the down-sampling function is weakened so that more detailed information can be seen by the filter, but the receptive fileds will be relatively small, may lose global information, and will introduce more operations to the convolution layer.

For the first method, although the receptive fileds is not smaller, the original image is divided into f*f area input network, which makes the filters unable to feel the finer information.

3, here upsampling operation can be regarded as deconvolution (deconvolutional), convolution operation parameters and CNN parameters are in the process of training FCN model through the BP algorithm learning.

The above is the results of the CNN processing, got the dense prediction, and the author found in the experiment, the resulting segmentation results are relatively rough, so consider adding more front-layer details, that is, the output of the penultimate layer and the final output to do a fusion, in fact, add and:

This results in the second and third rows, and experiments show that the results are more detailed and accurate. In the process of layered fusion, do the third row and then down, the results will become worse, so the author did not stop here. You can see the corresponding result as in the previous three lines:

The advantages of this approach are:

1, training a end-to-end FCN model, using convolutional neural network of strong learning ability, to get more accurate results, the previous CNN-based approach to the input or output to do some processing, in order to get the final result.

2, directly using the existing CNN network, such as Alexnet, VGG16, googlenet, just add upsampling at the end, parameter learning or using the principle of the reverse propagation of CNN itself, "Whole image training is effective and efficient. "

3, do not limit the size of the input picture, do not require all the pictures in the picture set is the same size, just in the last upsampling by the original image by the subsampling scale back, the final output will be the same size as the original dense prediction map.

According to the experiment shown in the conclusion section of the paper, the sample output is as follows:

It can be intuitively seen that this method and groud truth compared to the easy to lose smaller targets and local details, such as the first picture of the car, and the second picture of the audience crowd, if you want to improve, this should be some room for improvement.

Paper Notes "Fully convolutional Networks for Semantic Segmentation"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Paper Notes "Fully convolutional Networks for Semantic Segmentation"

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Paper Notes "Fully convolutional Networks for Semantic Segmentation"

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support