What makes for effective detection proposals? Thesis analysis

Source: Internet
Author: User
Tags random seed

1 Introduction (INTRODUCTION)

This paper mainly summarizes and evaluates the recent proposal detection methods. These are mainly the following methods.

2 Detection Proposal Method (Detection proposal METHODS)

The author divides Detection proposal into two categories, grouping method (dividing the picture into fragments, finally aggregating) and window scoring method (scoring a large number of Windows).

2.1 Grouping proposal Methods (Grouping proposal methods)

Grouping proposal methods attempts to produce multiple regions that correspond to the target (possibly overlapping). Depending on how they produce proposal, they can be divided into three classes: Superpixels (SP), graph Cut (GC), and Edge contours (EC).

Selectivesearch (SP) [15], [29]: Proposals is generated by the greedy merging of super pixels. This method has no learning parameters, and the merging of hyper-pixel features and similar functions is manually set. It is selected by R-CNN and Fast R-CNN detectors [8], [16] and other new target detection methods.

Randomizedprim ' s (SP) [26]: Using similar and selectivesearch features, but using a random hyper-pixel merging process to learn all possible (probabilities). In addition, the speed has been greatly improved.

Rantalankila (SP) [27]: using similar and selectivesearch strategies, but using different characteristics. In subsequent stages, the resulting region is used as the seed point (seeds) for solving the plot cut (similar to CPMC).

Chang (SP) [38]: Combined with saliency and objectness in a graph model to merge the hyper-pixel implementation foreground/background (figure/background) segmentation.

? CPMC (GC) [13],[19]: Avoid the initial segmentation, using several different seed points (seeds) and bits (unaries) to directly graph the pixels. The resulting region is sorted using a large feature pool.

? endres (GC) [14], [21]: Create a layered (hierarchical) partition from the occlusion boundary, and use different seed points and parameters to cut the plot generation area. Generated by the use of a lot of clues and to encourage diversity in the perspective of sorting.

? Rigor (GC) [28]: is an improvement of the CPMC, using multiple graph cutting and fast edge detection to speed up the calculation.

Geodesic (EC) [22]: First use [36] to picture over-split. The classifier is used to calibrate the seed point for a geodetic distance transformation. The horizontal set (level sets) for each distance conversion defines the split (Figure/ground).

MCG (EC) [23]: Based on [36], a fast algorithm for calculating multi-scale (Multi-scale) hierarchical segmentation process is proposed. Using edge strength to merge regions, the resulting target hypothesis (object hypotheses) is sorted using clues similar to scale, position, shape, and edge strength.

2.2 Proposal method for window scoring (Windows scoring proposal methods)

Window scoring proposal methods generates proposals by scoring each candidate window according to the probability that they contain the target. Compared to grouping approaches, these method values return the bounding box (bounding boxes) and are therefore faster. However, unless their window sampling densities are high, these methods have very low positional accuracy.

objectness [12], [24]: One of the earliest and broadest proposal methods. It scores these proposal by selecting a significant position in a picture as a proposal, and then scoring them through a number of clues such as color, edge, position, size, and superpixel straddling.

Rahtu [25]: Start with a large proposal pool that contains sampled areas (single, two, and three mega-pixels) and multiple randomly sampled boxes. A scoring strategy similar to objectness, but somewhat improved ([40] added additional low-level features and emphasized the importance of the non-maximum suppression of proper tuning (properly tuned nonmaximum suppression).

? Bing? [18]: Train a simple linear classifier through the edge and run in a sliding window. Using a sufficient approximation, get a very fast class of unknown detectors (1ms per frame in the cup). crackingbing [41] indicates that a classifier with a small impact and similar performance can be obtained by not having to look at the image ( classification performance is not derived from learning but geometry ).

? edgeboxes? EC [20]: Based on the target boundary estimation (obtained by structured decision forests [36], [42]) a coarse sliding window pattern is formed as a starting point, using a subsequent refinement step to improve the positional accuracy. Do not learn parameters. The authors propose to tune the method for different overlapping thresholds by adjusting the density of the sliding window pattern and the threshold value of the non-maximum suppression.

Feng [43]: By searching for significant image content to find proposal, a new significance measure is proposed, including a potential target can be composed of the rest of the picture. It uses a sliding window pattern and scores each position with significant clues.

Zhang [44]: It is proposed to train a cascade of sequencing SVMs on simple gradient features. The first stage trains different classifiers for different scales and aspect ratios (aspect ratio), and the second stage sorts all obtained proposals. All SVMs use structural output to score higher for windows with more overlapping targets. Because the cascade is trained and tested on the same category, it is not very clear about its generalization capability.

randomizedseeds [45]: rate each candidate window with multiple random SEED hyper-pixel maps. The scoring strategy is similar to Objectness's superpixel straddling (no additional information added). The authors show that using multiple hyper-pixel mappings (Superpixel maps) can significantly improve recall rates.

2.3 Other proposal methods (alternative proposal methods)

shapesharing [47]: is a non-parametric data-driven method, by matching the edge to transform the target shape from the example (exemplars) to the test picture. The resulting area is cut using a graph to combine and purify.

Multibox [9], [48]: Train a neural network to directly return to a certain number of proposals (do not need to slide the network on the picture). Each proposals has its own positional error. The method shows the best results in ImageNet.

2.4 Proposals VS cascade (proposals versus Cascades)

Proposals: Creating candidate windows using image features;
Cascade (Cascades): Use a fast but less accurate classifier to discard a large number of less-than-good proposals.
The main difference between the two is that cascading (Cascades) requires that the class of objects be generalized during the training process.
Proposal reasons for generalized object categories: 1) A major assumption is that for a large enough class to train a classifier for the generalized unknown category is sufficient (after training cats and dogs, can generalize to other animals). 2) The classifier's discriminant ability is often limited, thus preventing the classifier from overfitting and learning all target shared properties.

2.5 Control proposals (controlling the number of proposals)

Ranging from just a few (~ 2) to a large number (~ 5)

3 Proposals repeatability (proposal repeatability)

Training a detector on detecting proposals (detection proposals) instead of all sliding windows modifies the appearance distribution of all positive and negative windows (appearance distribution). In this section we mainly analyze the distribution of negative windows (the distribution of negative windows): If proposal does not consistently generate windows for similar images that contain parts or no targets, the classifier cannot score negative windows in the test set ( If the proposal method does not consistently propose windows on similar image content without objects or with partial obje CTS, the classifier may has difficulty generating scores on negative windows on the test set). An extreme example is that the training dataset contains only the targets, and the test set contains both the target and the negative window, so that the classifier obtained by the training will not be able to distinguish between the target and the background, so the negative window is given a useless rating during the test phase. So we want to proposals the appearance of consistency on the background of the distribution
Related to detectors.
We will proposals this kind of image content calibration of the properties of the proposals method repeatability . Visually, proposals should be repeatable with slightly different images that contain the same content.

3.1 Repeatable Evaluation Protocol (Evaluation protocol for repeatability)

For matching we use the intersection over Union (IoU) criterion.
Given the matching, we plot the recall for every IoU threshold and define the repeatability to is the area under this "Recall versus IoU threshold" curve between IoU 0 and 1.

3.2 Reproducible experiments and results (repeatability experiments and results)



Small changes to a image cause noticeable differences in the set of detection proposals for all methods except bing . The higher repeatability of Bing is explained by its sliding window pattern, which have been designed to cover ALM OST all possible annotations with IoU = 0.5 (see also Cracking Bing [41]).

4 proposals recall (proposal RECALL)

When using the proposals detection method, the interest target in the test picture requires a good coverage, because the missing target cannot be recovered in the subsequent classification phase. Therefore, recall rates are often used to evaluate the quality of proposals.

4.1 Recall Review Agreement (Evaluation protocol for recall) 4.2 recall results (recall results)




Reference and extension of reading materials
[9] C. Szegedy, S. Reed, D. Erhan, and D. Anguelov, "Scalable, highquality object detection," arxiv:1412.1441, 2014.
[C] B. Alexe, T. Deselaers, and V. Ferrari, "What is a object?" in CVPR, 2010.
J. Carreira and C. Sminchisescu, "Constrained parametric min-cuts for automatic object segmentation," in CVPR, 2010.
I. endres and D. Hoiem, "Category Independent object proposals," in ECCV, 2010.
K. van de Sande, J. Uijlings, T. Gevers, and A. Smeulders, "segmentation as selective search for object recognition," In ICCV, 2011.
[M.-m]. Cheng, Z. Zhang, W.-y. Lin, and P. H. Torr, "bing:binarized normed gradients for objectness estimation at 300fps," in CVPR, 2014.
J. Carreira and C. Sminchisescu, "cpmc:automatic object segmentation using constrained parametric min-cuts." Pami, 2012.
C. Zitnick and P. Dollár, "Edge boxes:locating object proposals from edges," in ECCV, 2014.
I. endres and D. Hoiem, "category-independent object proposals with diverse ranking," Pami, 2014.
P. Kr?henbühl and V. Koltun, "Geodesic object proposals," in ECCV, 2014.
P. Arbelaez, J. Pont-tuset, J. Barron, F. Marqués, and J. Malik, "Multiscale Combinatorial Grouping," in CVPR, 2014.
[C] B. Alexe, T. Deselares, and V. Ferrari, "Measuring the objectness of image windows," Pami, 2012.
E. Rahtu, J. Kannala, and M. Blaschko, "Learning a category independent object detection Cascade," in ICCV, 2011.
[+] S. Manén, M. Guillaumin, and L. Van Gool, "Prime object proposals with randomized prim ' s algorithm," in ICCV, 2013.
P. Rantalankila, J. Kannala, and E. Rahtu, "generating object segmentation proposals using global and local search," In CVPR, 2014.
A. Humayun, F. Li, and J. M. Rehg, "rigor:recycling inference in graph cuts for generating object regions," in CVPR, 2014.
J. Uijlings, K. Van de Sande, T. gevers, and A. Smeulders, "Selective search for object recognition," IJCV, 2013.
P. Dollár and C. L. Zitnick, "Fast edge detection using structured forests," Pami, 2015.
[K.-y]. Chang, T.-l Liu, H.-t. Chen, and S.-h. Lai, "fusing generic objectness and visual saliency for salient object detection," in ICCV, 2011.
J. Lim, C. L. Zitnick, and P. Dollár, "Sketch Tokens:a learned midlevel representation for contour and object detect Ion, "in CVPR, 2013.
[J] M. Blaschko, J. Kannala, and E. Rahtu, "Non maximal suppression in cascaded Ranking Models," Scandanavian Conferen Ce on Image analysis, 2013.
Q. Zhao, Z. Liu, and B. Yin, "Cracking BING and Beyond," in Bmvc, 2014.
P. Dollár and C. L. Zitnick, "Structured forests for fast edge detection," ICCV, 2013.
J. Feng, Y. Wei, L. Tao, C. Zhang, and J. Sun, "Salient object detection by composition," in ICCV, 2011.
Z. Zhang, J. Warrell, and P. H. S. Torr, "Proposal generation for object detection using cascaded ranking SVMs," in C VPR, 2011.
M. van Den Bergh, G. Roig, X. Boix, S. Manen, and L. van Gool, "Online Video seeds for temporal window objectness," in ICCV, 2013.
M. van den Bergh, X. Boix, G. Roig, and L. van Gool, "Seeds:superpixels extracted via Energy-driven sampling," IJCV, 2014.
J. Kim and K. Grauman, "Shape Sharing for Object segmentation," in ECCV, 2012.
D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, "Scalable object detection using deep neural networks," in CVPR, 20 14

What makes for effective detection proposals? Thesis analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.