Scene parsing--pyramid Scene parsing network

Last Update:2018-07-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Pyramid Scene Parsing Network
CVPR2017
Semantic segmentation
Https://github.com/hszhao/PSPNet

In view of the absence of context information in FCN, the proposed pspnet network embeds better global context information than global average pooling to enhance the segmentation effect.

2 Related Work

For scene parsing and semantic segmentation tasks, the deep convolution network is the current mainstream approach. Here our benchmark network is fcn+dilated network.

At present, there are two main research directions: 1 combining Multi-scale features and 2 using CRF as a post-processing method for segmentation.
For global context information, the document "24" uses global average pooling, but for complex ade20k databases, the results are not very good. Here we use another global context information

3 Pyramid Scene Parsing Network
3.1. Important observations
For the ade20k database, we have observed some phenomena:
1) There should be a certain correlation between mismatched relationship target.
2) The same object in the confusion Categories image is also labeled as two categories
field and Earth; Mountain and Hill,wall, house, building and skyscraper
3) inconspicuous Classes large target small target problem
To sum up, the main problem is contextual relationship and global information for different receptive fields

3.2. Pyramid Pooling Module

In a deep network, the size of the field determines how much context information we can use. Theoretically, the resnet size of the field is larger than that of the input image. But the literature "42" points out that the actual field size of CNN is much smaller than the theoretical size. The Global average pooling proposed in the literature "24" is too simplistic for complex ade20k databases. Here we use the literature "12" Spatial pyramid pooling to propose the Pyramid pooling module to obtain the global priori information.

The first line of the middle module Pyramid Pooling module, above, is a single bin output generated with global pooling
In the second line, we divide the feature map into 4 pieces, each with the global pooling to get bin output. The above figure four lines correspond respectively
1x1, 2x2, 3x3 and 6x6

In order to maintain the weight of the global feature, we use a 1x1 convolution layer in each row to reduce the dimension of the context representation. We then use the bilinear interpolation interpolation to make it as large as the original feature chart size. Finally, combined with the original feature map.

4 Deep Supervision for resnet-based FCN
In order to better train the model of network layer more, we introduced additional loss,another classifier is applied after the fourth stage

The auxiliary loss helps optimize the learning process, while the master branch loss the takes most. We add weight to balance the auxiliary loss.

Deep supervision has existed in DeepID2 of face recognition algorithm.

5 experiments

This article has been included in the following columns: Semantic segmentation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Scene parsing--pyramid Scene parsing network

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Scene parsing--pyramid Scene parsing network

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support