Paper reading notes: Ssd:single Shot multibox Detector

Source: Internet
Author: User
Tags scale image
paper Reading notes: Ssd:single Shot multibox Detector

This article mainly includes the following content:

Paper Address
Code address
Reference Blog

Paper reading notes SSD single Shot Multibox detector main IDEA network Structure multiscale feature map feature map matching strategy loss function experimental results

This paper presents a new idea for image detection, and the proposed network structure is SSD. Main Ideas

This paper presents a new image detection network –SSD. The proposal candidate box is not generated, and the default candidate box is used to directly predict bounding box coordinates and class object score scoring.

For different sizes of object detection, the traditional algorithm is to transform the image into different sizes, and then sent to the network for detection, through different sensory fields for image detection. The network SSD proposed in this paper combines different resolution feature map feature map to deal with Multi-scale image, and obtains the still good effect.

The SSD algorithm needs only one input image and ground truth boxes of each object in training. To achieve end-to-end training.

For 300*300 input, SSD can have a map of 74.3% on VOC2007 test, speed is FPS (Nvidia Titan X), SSD can have 76.9% map for 512*512 input. In contrast faster rcnn is 73.2% of the map and 7 Fps,yolo is 63.4% of the map and the FPS. Even lower-resolution input can achieve a higher accuracy rate. Network Structure

The basic network structure is based on VGG16, after training on the Imagenet dataset with two new convolution layer instead of fc6 and FC7, in addition to the POOL5 also made a little change, but also added 4 convolution layer to form the network of this paper.
   Multi-scale feature map feature map

In this paper, the different feature maps of different resolution feature MAP,SSD networks are detected respectively.
  
As shown in the figure, for 8*8 and 4*4 two sizes of feature maps. The feature map cell is each of these cells, and each cell in feature map has a series of fixed-size box,default box.
Here, we assume that each feature map cell has k default box, then for each default box you need to predict C category scoring score and 4 offset offsets, so if the size of a feature map is M*n, which is the M*n feature map cell, then this feature map is a total (c+4) *k*m*n output. Specifically, each default box generates 21 confidence (this is for the VOC dataset containing 20 object categories) and 4 coordinate values (X,Y,W,H). Moreover, the experiment shows that the more shape number of default box, the better the effect.

In the training phase, the algorithm will first match these default box and ground Truth box. So a ground truth may correspond to multiple default box. In the prediction phase, direct prediction of each default box offset and the corresponding score for each category, and finally through the NMS to get the final results. Matching Policy

First, we need to set the default box scale (size) and aspect ratio (transverse longitudinal ratio), for M feature map, the corresponding scale formula is as follows:
  
Here smin for 0.2,smax is 0.9. Similarly, the paper set aspect ratio to {1,2,3,1/2,1/3}, for each feature map feature map, there are 6 kinds of default box (note that the paper also added a new type of defaults). It can be seen that default box has different scale in different feature layers, and different aspect ratio in the same feature layer, so it can basically overwrite the various shapes and sizes of object in the input image.

When the default box and ground Truth box are matched, we set the IOU between default box and ground truth box to be greater than 0.5, which means that the box is a positive sample, and according to the hard sample Hard Nega tive, we order the confidence level of all box confidence, the ratio of positive and negative samples is 1:3, thus effectively training the network.
   loss Function

   Experimental Results

  
The experiment shows that the increase of DataSet is quite obvious to MAP promotion. In the same way, comparing the detection network Fast and faster, the detection of the network SSD obtains better results.
  
The experiments show that all the methods can increase MAP by comparing various design methods.
  
The experiments show that the fusion of different layers is an important method, in which the detection problem of object with different sizes is mainly solved.
  
The experimental results show that the comparison of YOLO and faster rcnn shows that the SSD is fast and the accuracy rate is higher.

The author of this article mentions that the algorithm is worse than the large object for the small object detection. The authors think the reason is that these small object accounts for too little information at the top of the network, so increasing the size of the input image is helpful for small object detection. Additional data sets for small object detection is also helpful, because the randomly cropped image is equivalent to "enlarge" the original image, so the clipping operation not only increased the number of images, but also magnified the image.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.