Scene parsing--pyramid Scene parsing network

Source: Internet
Author: User

Pyramid Scene Parsing Network
CVPR2017
Semantic segmentation
Https://github.com/hszhao/PSPNet

In view of the absence of context information in FCN, the proposed pspnet network embeds better global context information than global average pooling to enhance the segmentation effect.

2 Related Work

For scene parsing and semantic segmentation tasks, the deep convolution network is the current mainstream approach. Here our benchmark network is fcn+dilated network.

At present, there are two main research directions: 1 combining Multi-scale features and 2 using CRF as a post-processing method for segmentation.
For global context information, the document "24" uses global average pooling, but for complex ade20k databases, the results are not very good. Here we use another global context information

3 Pyramid Scene Parsing Network
3.1. Important observations
For the ade20k database, we have observed some phenomena:
1) There should be a certain correlation between mismatched relationship target.
2) The same object in the confusion Categories image is also labeled as two categories
field and Earth; Mountain and Hill,wall, house, building and skyscraper
3) inconspicuous Classes large target small target problem
To sum up, the main problem is contextual relationship and global information for different receptive fields

3.2. Pyramid Pooling Module

In a deep network, the size of the field determines how much context information we can use. Theoretically, the resnet size of the field is larger than that of the input image. But the literature "42" points out that the actual field size of CNN is much smaller than the theoretical size. The Global average pooling proposed in the literature "24" is too simplistic for complex ade20k databases. Here we use the literature "12" Spatial pyramid pooling to propose the Pyramid pooling module to obtain the global priori information.

The first line of the middle module Pyramid Pooling module, above, is a single bin output generated with global pooling
In the second line, we divide the feature map into 4 pieces, each with the global pooling to get bin output. The above figure four lines correspond respectively
1x1, 2x2, 3x3 and 6x6

In order to maintain the weight of the global feature, we use a 1x1 convolution layer in each row to reduce the dimension of the context representation. We then use the bilinear interpolation interpolation to make it as large as the original feature chart size. Finally, combined with the original feature map.

4 Deep Supervision for resnet-based FCN
In order to better train the model of network layer more, we introduced additional loss,another classifier is applied after the fourth stage

The auxiliary loss helps optimize the learning process, while the master branch loss the takes most. We add weight to balance the auxiliary loss.

Deep supervision has existed in DeepID2 of face recognition algorithm.

5 experiments

This article has been included in the following columns: Semantic segmentation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.