Transferred from: http://www.sohu.com/a/215073729_297710
Original source: arxiv
Author: Alexander Kirillov, kaiming He1, Ross Girshick, Carsten Rother, Piotr Dollar
"Lake World" compiles: Yes, Astro, Kabuda.
Nowadays, we propose and study a new "Panorama segmentation" (Panoramic segmentation,ps) task. It can be said that Panorama segmentation will unify the tasks of separating (detecting and segmenting each target instance) and semantic segmentation (Assigning a class label to each pixel) in the traditional sense of separate instances. This unification is natural and presents a new challenge in an isolated state of study that neither exists in the case nor exists in the semantic segmentation. To measure the performance of task execution, we introduced a quality of panorama (panoptic quality, PQ) metric, and showed that it was very simple and explanatory. In the case of PQ, we have studied human performance on three existing datasets, where the necessary PS annotations will help us to better understand the tasks and metrics of the data set. We also propose a basic algorithm to combine the output of the example and semantic segmentation into the panorama output and compare it with the human performance. It can be said that in terms of segmentation and visual recognition, PS can be the basis for its future challenges. Our goal is to promote research in a new direction by inviting communities to explore the proposed Panorama segmentation task.
For a given (a) image, we demonstrate a reference to the following tasks: (b) semantic segmentation (each pixel has a class label), (c) instance segmentation (each target has masks and class tags), and (d) the proposed Panorama segmentation (PS) task (each pixel has a class + instance label). Panorama segmentation generalizes Semantic and instance segmentation, and requires the identification and depiction of each visible target and region in the image. We hope that this unified Division will present new challenges and create new approaches.
In the early days of the development of computer vision, things (things), such as people, animals, tools and other objects that can be counted, get the dominant attention. In questioning whether this trend is intelligent, Adelson improves the importance of the research system, which identifies stuff (materials), such as grass, Sky, roads, and other amorphous areas of similar textures or materials. This dichotomy between things and materials has been used so far, not only in the division of visual recognition tasks, but also in the special algorithms for the task development of things and materials.
The task of learning materials is often seen as a task called semantic segmentation, as shown in Figure 1b. Because the material is amorphous and irreducible, this task is defined as simply assigning a category tag to each pixel in the image (note that semantic segmentation treats the category of things as material). In contrast, the task of studying things is usually expressed as a target detection or an instance segmentation task, whose purpose is to detect each target and describe it with a bounding box or a partition mask, see figure 1c. Although these two visual recognition tasks may seem relevant, they vary greatly in data sets, details, and metrics.
Split defects. The image is scaled and cropped. Top row (vistas image): Two annotations identify the target as a car, however, the person divides a car into two cars. Bottom row (cityscapes image): segmentation is very blurry.
The split between semantics and instance segmentation leads to parallel splits in these task methods. Material classifiers are usually built on a fully expanded convolutional network, whereas target detectors typically use target proposals (object proposals) and are region-based. In the past decade, the overall algorithmic progress of these tasks is inconceivable, but if you isolate these tasks, you may overlook some important things.
In this study, we will ask: is there a reconciliation between things and stuff? Whether there is such a simple problem statement can elegantly cover these two tasks. What would a unified visual identity system look like?
Classification defects. The image is scaled and cropped. Top row (ade20k picture): Simple error classification. Bottom row (cityscapes image): The field is very difficult to classify, the tram is the right classification. Many of these errors are difficult to resolve.
Considering these problems, we propose a new task that contains both things and stuff. The task we get is called Panorama segmentation (PS). The definition of panorama is "everything visible in a view", in our context, the Panorama view refers to a unified global view of the split. PS's task expression seems simple: each pixel of an image must be assigned a semantic tag and an instance ID. Pixels with the same label and ID belong to the same destination, and for material labels, the instance ID is ignored. Both reference standards and machine predictions must have this form. See Figure 1d Visualization.
Panorama Segmentation is a generalization of semantic segmentation and instance segmentation, but new algorithm challenges are introduced. Different from semantic segmentation, panorama segmentation needs to distinguish individual target instances; This presents a challenge to the full convolutional network. Unlike instance segmentation, the goal segmentation must be non-overlapping in the Panorama segmentation, which challenges the region-based approach to operating each target independently. Moreover, this task requires the identification of both things and stuff. Designing a clean, end-to-end system for panoramic segmentation is an open question that requires exploring innovative algorithmic ideas.
Cityscapes (left second) and ade20k (right three) for panoramic segmentation results. Predictions are based on the combined output of the most advanced examples and semantic segmentation algorithms. Matches the color of the part (Iou> 0.5) (The cross-hatch pattern indicates a mismatched area, and black indicates an unmarked area). The most present is the best color and zoom.
Our new Panorama segmentation task requires a new metric. We strive to make our metrics complete, explanatory and simple. Perhaps surprisingly, for our seemingly complex task, there is a natural metric that satisfies these qualities. We define the panorama quality (PQ) metric and show that it can be decomposed into two explanatory terms: Split mass (SQ) and quality of inspection (DQ), and further refine the precision.
Since the reference standard for Panorama segmentation (Ground Truth) and the output of the algorithm must be in the same form, we can conduct a detailed study of human performance (human performance) on the Panorama segmentation. This allows us to learn more about the panorama quality metrics, including detailed analysis of detection and segmentation, and the performance comparison of materials and things (stuff and things). Also, measuring the body PQ helps us understand the machine's performance. This is important because it allows us to monitor the performance saturation on various datasets in the Panorama segmentation.
Finally, we make a preliminary study on the machine performance of Panorama segmentation. To do this, we identified a simple but probably not optimal heuristic, which combines the output of two independent systems into semantic and instance segmentation through a series of post-processing steps (post-processing steps), which is actually a complex form of non-maximum suppression. Our heuristic algorithm establishes a baseline for Panorama segmentation and provides us with insights into the main algorithmic challenges it presents (main algorithmic challenges).
We studied the performance of humans and machines on three general-purpose segmented datasets, all three of which contain annotations to materials and things (stuff and things). These datasets are cityscapes, ade20k, and mapillary vistas, respectively. For each data set, we get the results of the most advanced methods directly from the challenge organizer. In the future, we will extend the analysis to Coco (stuff) in Coco. We set together these data to provide a solid foundation for studying the performance of human and machine in Panorama segmentation.
Our goal is to promote research in the new direction by inviting communities to explore new panoramic segmentation tasks. We believe that the proposed tasks will lead to innovations beyond expectations and expectations. Finally, let's explore these possibilities and our plans for the future.
For the sake of simplification, the PS "algorithm" presented in this paper is a heuristic combination based on the optimal execution instance and the output of the semantic segmentation system. This approach is the first step in the basics, but we want to introduce more interesting algorithms. Specifically, we would like to see the Panorama split in at least two innovations: (1) The deep integrated end-to-end model can simultaneously solve the dual nature of panorama segmentation. Many instance segmentation methods are designed to produce non-overlapping instance predictions and can be used as a basis for this system. (2) Since Panorama segmentation cannot have overlapping parts, some form of high-level "inference" may be useful, for example, to extend a learning-based NMS to a panorama segment. We hope that the Panorama Segmentation task can promote the research in these fields, and thus bring about a new breakthrough.