Paper read--scalable Object Detection using deep neural Networks

Source: Internet
Author: User

Scalable Object Detection using deep neural Networks

author : Dumitru Erhan, Christian szegedy, Alexander Toshev, and Dragomir Anguelov

references : Erhan, Dumitru, et al. "Scalable object detection using deep neural networks." Proceedings of the IEEE Conference on computer Vision and Pattern recognition. 2014.

citations : 181 (Google scholar, by 2016/11/23).

Project Address : Https://github.com/google/multibox

1 Introduction

This is a 2014 CVPR conference paper, several authors are Google's, the detection algorithm is named "Deepmultibox". First of all, take a look at the idea of this model: the target detection in this paper is a two-step strategy:

The first step : create candidate regions on the image ; Previously commonly used to generate candidate areas is the exhaustive method, the image of all the location and scale of the poor, this calculation efficiency is too low, has been abandoned, and now a number of other methods, such as paper < paper reading notes--selective Search for Object recognition> The method of selective search, using hierarchical clustering, generates a specified number of candidate regions that are most likely to contain a target. Similarly, this article is also working in this regard, It is proposed to use CNN to generate candidate regions, and named "Deepmultibox";

The second step: Using CNN to classify the generated candidate regions ; After generating the candidate regions, extracting the features and then classifying them by using classifiers to achieve the purpose of recognition, this is the general idea, there is nothing good to say, the focus of this article in the first step.

2 This model

2.1 Regression Model Deepmultibox

How do I use CNN to generate candidate areas on an image? This paper draws on the structure of Alexnet network:

To model this problem! Our purpose here is to let CNN output a certain number of bounding boxes (each box is represented by 4 parameters, respectively, the upper left corner of this box is the horizontal axis + ordinate, the lower right corner of the horizontal axis + ordinate, the value of each coordinate with the image width and height of the normalization), Also output a confidence level on each box that contains the target (the value is between 0~1). This way, if we want CNN to output k=100 Bounding boxes,cnn the output layer node will be a dimension (k* (4+1) =5*k=500).

Training set Construction of 2.1.1 Deepmultibox

How is the training set structured? The input to the training set is definitely the "Maximum center square crop" on each training image, meaning that the center point of each image is calculated first, Then it is centered from the image to cut out a largest square, in order to meet the ALEXNET network structure, each image may also be resize to 220*220 size (this is described in the original 4.2.2); The key is the output, whichThe text is more obscure, the original expression is: "For each image,we generate the same number of square samples such so the total number of samples are about ten Millio N. for each image, the samples is bucketed such that for each of the ratios in the ranges of 0?5%, 5?15%, 1 5?50%, 50?100%,there is a equal number of samples in which the ratio covered by the bounding boxes are in the given range. "My understanding is that for each image in the training set, the area of the square that produces a fixed number (assuming N) is used as the training set ( question 1: Why Square?) are these areas the same size? If the same, how to meet the target Multiscale requirements? If not, how do you choose the size of the area?), the selection of the N region is fastidious: It consists of four parts, each of which is equal to the number of regions, and the degree of overlap of the area in each part with the GT boxes on the image is 0-5%,5-15%,15-50%,50-100% respectively. The confidence level of each region is also not how to determine, I think it should be each area and GT boxes degree of coincidence it!

( question 2: Training sample is not so constructed, please advise!)

Training of 2.1.2 Deepmultibox

2.2 It tells how the training set is structured (probably I don't understand correctly, but the text is too vague), and the following starts to train the Alexnet model. If the number k of the return bounding box is set to 100, 500 parameters will need to be returned, The number of output layer nodes in this alexnet is set to 500 (which is not discussed in this article). Since it is a return, it may not be possible to use Softmax directly behind CNN, the author himself has set the objective function, specifically see the original paper.

2.2 CNN Classification Model

Training set construction of 2.2.1 Classification model

Deepmultibox the K candidate areas on each image, but what kind of candidate areas are not yet determined, so there is a need to train a CNN to classify these areas.

The 4.2.1 section of the original text briefly describes the training sample construction used to train the CNN classifier (for VOC datasets, the total number of categories is 20):

Positive Sample : to construct a positive sample for each category, if the Jaccard between the candidate region and the GT boxes is greater than 0.5, then the area is labeled as a positive sample, so that a total of 10 million positive samples are generated, covering 20 classes;

Negative Samples : Similar to the normal sample construction, only jaccard to be less than 0.2 is considered a negative sample, so that a total of 20 million negative samples produced;

Structure of 2.2.2 Classification model

The text does not seem to specifically describe the structure of the classification model, only know the use of alexnet, the output layer of the number of nodes must be changed to 21 (for the VOC Data set), the size of the sample set area is not know! Each sample area is resize to the specified size of the Alexnet network input, I don't see any of this!

2.3 Test Process

The process of testing is described in 4.2.2 of the original text (assuming there are n targets): Given a test image--crop out its largest square area--resize the area to 220*220 size--into the Deepmultibox network for regression, Get K regression boxes and each box's confidence score--by using a non-maximum suppression method, the box with the overlap of less than 0.5 is removed-the 10 regions with the highest confidence score will be reserved-and the areas sent to the category CNN for soft classification, Output the probability value of each region, get the probability matrix of 10* (n+1) and the confidence level of each region multiply the probability value as its final score--these fractions are used to estimate and calculate the p-r curve.

( question 3: The process of this test I think there is a final score to judge the process, do not know how to determine the final result!)

Reference documents:

[1]  

Paper read--scalable Object Detection using deep neural Networks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.