The Ssd:single Shot Multibox detector is a target detection and recognition model with end to ends. First small gossip, it belongs to the Google faction, its author is also the author of Googlenet. The model is designed for fast recognition with high precision, and it can achieve considerable recognition accuracy without extra computing bounding box, and has a significant speed improvement, claiming to be 58 fps and 72.1% map.
Let's look at the whole picture of the model. Its bottom layer is a classic VGG16 network (which can also be replaced by ResNet), where the convolution layer Conv4_3 and the full connection layer fc7, as well as the further three convolution layer conv6, CONV7, CONV8, respectively, divided into expenditure mbox_conf, Mbox_loc, Priorbox Three kinds of nodes (called X nodes), then merge the x nodes from different layers through corresponding concat, and then classify the concat result output together.
In more detail, you can see the code output from the following forward calculation.
[INFO 2016-08-30 21:58:39.619143 21429 net.cpp:540] forwarding data [info 2016-08-30 21:58:39.622481 21429 net.cpp:540] forwarding Data_data_0_split [info 2016-08-30 21:58:3 9.622514 21429 net.cpp:540] Forwarding conv1_1 [INFO 2016-08-30 21:58:39.627096 21429 net.cpp:540] Forwarding relu1_1 [INF O 2016-08-30 21:58:39.627473 21429 net.cpp:540 forwarding conv1_2 [INFO 2016-08-30 21:58:39.631721 21429 net.cpp:540] for warding relu1_2 [Info 2016-08-30 21:58:39.631757 21429 net.cpp:540] Forwarding pool1 [INFO 2016-08-30 21:58:39.632096 2142 9 net.cpp:540] Forwarding Conv2_1 [info 2016-08-30 21:58:39.634774 21429 net.cpp:540] forwarding relu2_1 [info 2016-08-30 21:58:39.634809 21429 net.cpp:540] Forwarding conv2_2 [INFO 2016-08-30 21:58:39.639045 21429 net.cpp:540] Forwarding Relu 2_2 [Info 2016-08-30 21:58:39.639080 21429 net.cpp:540] Forwarding pool2 [INFO 2016-08-30 21:58:39.639394 21429 net.cpp:54 0] Forwarding Conv3_1 [INFO 2016-08-30 21:58:39.642501 21429 net.cpp:540] forwarding relu3_1 [inFo 2016-08-30 21:58:39.642535 21429 net.cpp:540] Forwarding conv3_2 [INFO 2016-08-30 21:58:39.647202 21429 net.cpp:540] fo rwarding Relu3_2 [Info 2016-08-30 21:58:39.647235 21429 net.cpp:540] Forwarding conv3_3 [INFO 2016-08-30 21:58:39.650738 2 1429 net.cpp:540] Forwarding relu3_3 [INFO 2016-08-30 21:58:39.650770 21429 net.cpp:540] forwarding pool3 [info 2016-08-30 21:58:39.651074 21429 net.cpp:540] Forwarding conv4_1 [INFO 2016-08-30 21:58:39.655285 21429 net.cpp:540] Forwarding Relu 4_1 [Info 2016-08-30 21:58:39.655323 21429 net.cpp:540] Forwarding conv4_2 [INFO 2016-08-30 21:58:39.660395 21429 net.cpp: 540] Forwarding Relu4_2 [info 2016-08-30 21:58:39.660429 21429 net.cpp:540] forwarding Conv4_3 [info 2016-08-30 21:58:39.6 65523 21429 net.cpp:540] Forwarding Relu4_3 [INFO 2016-08-30 21:58:39.665555 21429 net.cpp:540] Forwarding conv4_3_relu4_ 3_0_split [Info 2016-08-30 21:58:39.665570 21429 net.cpp:540] forwarding Pool4 [info 2016-08-30 21:58:39.665881 21429 net. CPP:540] Forwarding Conv5_1 [Info 2016-08-30 21:58:39.668714 21429 net.cpp:540] Forwarding Relu5_1 [INFO 2016-08-30 21:58:39.668748 21429 net.cpp:5 forwarding Conv5_2 [info 2016-08-30 21:58:39.671761 21429 net.cpp:540] forwarding Relu5_2 [info 2016-08-30 21:58:39.67 1807 21429 net.cpp:540] Forwarding conv5_3 [INFO 2016-08-30 21:58:39.675269 21429 net.cpp:540] Forwarding relu5_3 [INFO 20 16-08-30 21:58:39.675302 21429 net.cpp:540] Forwarding pool5 [INFO 2016-08-30 21:58:39.675624 21429 net.cpp:540] Forwardi ng Fc6 [info 2016-08-30 21:58:39.685935 21429 net.cpp:540] Forwarding relu6 [INFO 2016-08-30 21:58:39.685971 21429 net.cpp : 540] Forwarding Fc7 [info 2016-08-30 21:58:39.688531 21429 net.cpp:540] forwarding Relu7 [info 2016-08-30 21:58:39.688565 21429 net.cpp:540] Forwarding fc7_relu7_0_split [INFO 2016-08-30 21:58:39.688580 21429 net.cpp:540] Forwarding conv6_1 [I NFO 2016-08-30 21:58:39.691439 21429 net.cpp:540] Forwarding Conv6_1_relu [INFO 2016-08-30 21:58:39.691473 21429-net.cpp: 540] Forwarding Conv6_2 [INFO 2016-08-30 21:58:39.695135 21429 net.cpp:540] Forwarding Conv6_2_relu [INFO 2016-08-30 21:58:39.695169 21429-net.cpp: 540] Forwarding Conv6_2_conv6_2_relu_0_split [INFO 2016-08-30 21:58:39.695183 21429 net.cpp:540] Forwarding conv7_1 [ Info 2016-08-30 21:58:39.698765 21429 net.cpp:540] Forwarding Conv7_1_relu [INFO 2016-08-30 21:58:39.698796 21429-Net.cpp : 540] Forwarding conv7_2 [info 2016-08-30 21:58:39.701938 21429 net.cpp:540] forwarding Conv7_2_relu [info 2016-08-30 21:5 8:39.702193 21429 net.cpp:540] Forwarding conv7_2_conv7_2_relu_0_split [INFO 2016-08-30 21:58:39.702220 21429-Net.cpp : 540] Forwarding Conv8_1 [info 2016-08-30 21:58:39.704677 21429 net.cpp:540] forwarding Conv8_1_relu [info 2016-08-30 21:5 8:39.704716 21429 net.cpp:540] Forwarding conv8_2 [INFO 2016-08-30 21:58:39.707798 21429 net.cpp:540] Forwarding conv8_2_ Relu [Info 2016-08-30 21:58:39.707839 21429 net.cpp:540] forwarding Conv8_2_conv8_2_relu_0_split [info 2016-08-30 21:58:39.707859 21429 net.cpp:540] Forwarding pool6 [Info 2016-08-30 21:58:39.707926 21429 net.cpp:540] forwarding Pool6_pool6_0_split [info 2016-08-30 21:58:39.70794 7 21429 net.cpp:540] Forwarding conv4_3_norm [INFO 2016-08-30 21:58:39.711788 21429 net.cpp:540] Forwarding conv4_3_norm_ Conv4_3_norm_0_split [Info 2016-08-30 21:58:39.711818 21429 net.cpp:540] forwarding Conv4_3_norm_mbox_loc [info 2016-08-30 21:58:39.714972 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_loc_perm [INFO 2016-08-30 21:58:39.717313 21429 net.cpp:540] Forwarding Conv4_3_norm_mbox_loc_flat [INFO 2016-08-30 21:58:39.717339 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_conf [INFO 2016-08-30 21:58:39.724395 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_conf_perm [ Info 2016-08-30 21:58:39.731096 21429 net.cpp:540] forwarding Conv4_3_norm_mbox_conf_flat [info 2016-08-30 21:58:39.731127 21429 net.cpp:540] Forwarding Conv4_3_norm_mbox_priorbox [INFO 2016-08-30 21:58:39.731290 21429-Net.cpp : 540] Forwarding Fc7_mbox_loc [INFO 2016-08-30 21:58:39.733963 21429 NET.CPP:540] Forwarding fc7_mbox_loc_perm [INFO 2016-08-30 21:58:39.737503 21429 net.cpp:540] Forwarding fc7_mbox_loc_ Flat [info 2016-08-30 21:58:39.737527 21429 net.cpp:540] Forwarding fc7_mbox_conf [INFO-2016-08-30 21:58:39.746902 21429 N ET.CPP:540] Forwarding fc7_mbox_conf_perm [INFO 2016-08-30 21:58:39.750918 21429 net.cpp:540] Forwarding fc7_mbox_conf_ Flat [info 2016-08-30 21:58:39.750946 21429 net.cpp:540] Forwarding Fc7_mbox_priorbox [INFO 2016-08-30 21:58:39.751056 21 429 net.cpp:540] Forwarding Conv6_2_mbox_loc [INFO 2016-08-30 21:58:39.753976 21429 net.cpp:540] Forwarding conv6_2_mbox _loc_perm [Info 2016-08-30 21:58:39.756206 21429 net.cpp:540] forwarding Conv6_2_mbox_loc_flat [info 2016-08-30 21:58:39.756239 21429 net.cpp:540] Forwarding conv6_2_mbox_conf [INFO 2016-08-30 21:58:39.763130 21429 net.cpp:540] Forwarding conv6_2_mbox_conf_perm [INFO 2016-08-30 21:58:39.764664 21429 net.cpp:540] Forwarding Conv6_2_mbox_conf_ Flat [INFO 2016-08-30 21:58:39.764689 21429 net.cpp:540] forwarding Conv6_2_mbox_priorbox [info 2016-08-30 21:58:39.764760 21429 net.cpp:540] forwarding Conv7_2_mbox_loc [Info 2 016-08-30 21:58:39.768630 21429 net.cpp:540] Forwarding conv7_2_mbox_loc_perm [INFO 2016-08-30 21:58:39.772903 21429 Net . cpp:540] Forwarding Conv7_2_mbox_loc_flat [INFO 2016-08-30 21:58:39.772927 21429 net.cpp:540] Forwarding conv7_2_mbox_ conf [info 2016-08-30 21:58:39.777669 21429 net.cpp:540] forwarding conv7_2_mbox_conf_perm [info 2016-08-30 21:58:39.781180 21429 net.cpp:540] Forwarding Conv7_2_mbox_conf_flat [INFO 2016-08-30 21:58:39.781205 21429-net.cpp:540 ] forwarding Conv7_2_mbox_priorbox [info 2016-08-30 21:58:39.781263 21429 net.cpp:540] forwarding Conv8_2_mbox_loc [info 2016-08-30 21:58:39.783634 21429 net.cpp:540] Forwarding conv8_2_mbox_loc_perm [INFO 2016-08-30 21:58:39.788920 21429 NET.CPP:540] Forwarding Conv8_2_mbox_loc_flat [INFO 2016-08-30 21:58:39.788944 21429 net.cpp:540] Forwarding conv8_2_ mbox_conf [INFO 2016-08-30 21:58:39.793294 21429 net.cpp:540] Forwarding conv8_2_mbox_conf_perm [INFO 2016-08-30 21:58:39.797371 21429 net.cpp:540] Forwarding conv8_2_mbox_conf _flat [Info 2016-08-30 21:58:39.797397 21429 net.cpp:540] forwarding Conv8_2_mbox_priorbox [info 2016-08-30 21:58:39.797449 21429 net.cpp:540] Forwarding pool6_mbox_loc [INFO 2016-08-30 21:58:39.800542 21429 net.cpp:540] forwarding Pool6_mbox_loc_perm [info 2016-08-30 21:58:39.804468 21429 net.cpp:540] forwarding Pool6_mbox_loc_flat [info 2016-08-30 21:58:39.804493 21429 net.cpp:540] Forwarding pool6_mbox_conf [INFO 2016-08-30 21:58:39.808717 21429-Net.cpp : 540] Forwarding pool6_mbox_conf_perm [INFO 2016-08-30 21:58:39.812292 21429 net.cpp:540] Forwarding Pool6_mbox_conf_ Flat [info 2016-08-30 21:58:39.812317 21429 net.cpp:540] forwarding Pool6_mbox_priorbox [info 2016-08-30 21:58:39.812382 21429 net.cpp:540] Forwarding mbox_loc [INFO 2016-08-30 21:58:39.812604 21429 net.cpp:540] Forwarding mbox_conf [Info 2016 -08-30 21:58:39.812834 21429 net.cpp:540] Forwarding Mbox_priOrbox [Info 2016-08-30 21:58:39.819844 21429 net.cpp:540] Forwarding mbox_conf_reshape [INFO 2016-08-30 21:58:39.819871 2 1429 net.cpp:540] Forwarding Mbox_conf_softmax [INFO 2016-08-30 21:58:39.820596 21429 net.cpp:540] Forwarding mbox_conf_ Flatten [info 2016-08-30 21:58:39.820647 21429 net.cpp:540] Forwarding detection_out [INFO 2016-08-30 21:58:39.832866 214
NET.CPP:540] Forwarding Detection_eval
SSD networks use a large number of small convolution cores (1x1, 3x3), not only for classification but also for bounding box position regression, through some filtering to achieve different aspect ratio of the target detection, and then for the subsequent feature map under the multi-scale detection.
SSD designed a collection of bounding box, containing 4: long, wide, large square, small square, distributed in different sizes (4x4,8x8) of the feature map of each location, that is, the convolution of the way to cover a m*n*p feature map m*n location. In training, these box matches with Groundtruth box, that is, for each box calculation and groundtruth displacement and classification probability, obtained 4 displacement value and c classification probability value, and according to the Groundtruth category to obtain TP and FP, Finally, the total loss of the model is obtained by calculating the weighted position loss and the classified confidence loss, and the final detection result is obtained by the inhibition of the maximal value.
box with different shapes, and its application under Multiresolution feature map, can realize the discretization of box's parameter space and improve the computational efficiency. Groudtruth information, including categories and locations, needs to be explicitly attached to those network outputs, so that the loss function and the reverse propagation are end-end. In training, you need to match the groundtruh and box, as long as the Groundtruth Jaccard coverage is greater than 0.5, you can correspond to the Groundtruth, each groundtruh must have at least one box with its corresponding. In addition, when the number of candidate box is many, the FP is also many, resulting in the number of TP and FP imbalance. Thus, the candidate box is sorted according to the classification confidence, and the top candidate makes the ratio of FP and TP at 3:1.
about how to identify multi-scale targets. We know that the low-level feature map can improve the quality of the semantic segmentation by improving the detail of the image, and the feature map of the high level will smooth the segmentation result. Thus, the feature map of the lower and upper layers is integrated to detect. The feature map of different layers has different sensory field dimensions, which is very important to refer to Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge I N Deep scene CNNs. In:iclr. (2015). However, instead of building a different size box for a particular layer of feature map, the feature map of a layer only learns to detect objects of a certain scale, so a layer of feature map has only one dimension box. For example, the box in 8x8 's feature map cannot detect a larger dog (pictured below). From lower to upper, box scaling is evenly distributed between the 0.2~0.95. To solve the aspect ratio problem further, each layer box generates the {1,1+,2,3,1/2,1/3} 6 expansion box with different aspect ratios.
SSD, in a sense, combines the ideas of RPN and YOLO. That
1 The anchor thought of RPN, using 256 3x3 filters on the feature map, is in fact the expression of 9 anchor box features at each location of the feature map, from 256 dimensions. The position of the filter sliding window provides positioning information relative to the original image. The regression box provides finer positioning information relative to the sliding window. RPN reduces the computation by 256 times times (i.e., from an operation based on the original image to an operation based on the feature map).
(2) Yolo's regression thought, that is, using the feature to return to the target position and the category, but did not use the ROI pooling to classify and extract.