SSD predictive bbox is essentially the same as RPN, except that it predicts on multiple layers to better detect objects on multiple scales. predicting bbox on multiple layers
The original SSD300 is predicted on the following layer:
Conv4 ==> x conv7 ==> x conv8 ==>
5 x
5
conv9 ==> 3 3
CONV11 ==> 1 x 1
The number that follows is the size of the feature map for this layer output.
SSD512:
Conv4 ==> x conv7 ==> x conv8 ==>
8 x
8
conv9 ==> 4 x conv10 2/>CONV11 ==> 2 x 2
conv12 ==> 1 x 1
You can also decide on which layers to predict on the basis of the actual situation. Size of Reference box
Take SSD512 as an example.
Like RPN, each layer has a specific size refercence box that calculates its own default boxes (anchors) based on it. Reference box is a square box, the size of which is determined by the scale parameter, which the author uses to calculate:
Where: M M is responsible for predicting the number of layers, smin=0.2,smax=0.95 s_{min}= 0.2, s_{max}=0.95, meaning the lowest layer (CONV4) reference box wbbox=smin W_{bbox} = S_{min}. The top level is similar.
During training, the input images are reshape to 512*512, so that the reference box on the conv4 is a square with an actual side length of 512*0.2=102. Aspect Ratios for default boxes
After getting a layer of reference box, according to the layer configuration of the aspect ratio, you can get a number of different aspect ratio default box (also known as anchor).
Take the following parameters as an example of how anchor is calculated: size = 0.2 * 102.4, size of reference box Aspect_ratio = [2] feat_shape=[8, 8] Img_shape = [51 2, Max] #[h, W]
I anchor W, H for:
s = aspect_ratio[i];
w = size/img_shape[1] * NP.SQRT (s)
h = size/img_shape[0]/np.sqrt (s)
W,h W, H is now a relative value between 0 and 1. At coding, it was enough to get to this step. The following analysis gets the anchor of its true aspect ratio.
Calculate the absolute length of W, H first:
w = w * img_shape[1]
h = h * img_shape[0]
Its true aspect ratio are:
Wh=size/wimg∗s√∗wimgsize/himg/s√∗himg=s \frac WH = \frac{size/w_{img} * \sqrt S * w_{img}}{size/h_{img}/\sqrt S * H_{i MG}} = S
So, whether in training: himg==wimg h_{img} ==w_{img}, or when testing: himg h_{img} is generally not equal to wimg w_{img},anchor a