Refer to the following paper 2017 latest version:
Speed/accuracy trade-offs for modern convolutional object detectors Background
The purpose of this paper is to help select a suitable detection frame, which can realize the multi-balance of speed, accuracy and memory under the premise of a given device and platform.
Google Analytics has built three common inspection frameworks: Faster r-cnn, R-FCN and SSD,
In recent years, target detection field leaps and bounds, Faster r-cnn, R-FCN, Multibox, SSD and YOLO and other detection algorithms are good enough to put in practical applications.
But it is difficult for practitioners to decide when to use which framework is appropriate.
Map can't tell you all about it. meta-architectures Infrastructure
Inspired by the classified network, from RCNN to Fast-rcnn, to later faster-rcnn. Here rcnn to FAST-RCNN are the candidate areas extracted from outside the neural network, and FASTER-RCNN is the candidate region extracted within the neural network.
Typical of these work is a series of different spatial positions, different scales, and different length and width ratios of boxes in the image, to act as "anchor", "Default boxes", "Priors".
So a training model needs to predict two kinds of information at an anchor point:
(1) predicting a category for each anchor point;
(2) to predict the X-and Y-direction offsets for each anchor point to determine the bounding box;
This anchor strategy makes the accuracy and computation of target detection greatly optimized.
For example, in Multibox, these anchors are obtained through bounding box clustering in Groundtruth;
In this article, we mainly study three kinds of infrastructure: Faster r-cnn, R-FCN and SSD
For a more scientific comparison, Meta-architecture is stripped from the feature extractor.