Bounding-box regression
Recently has been looking at detection-related paper, from rcnn, fast rcnn, faster rcnn, YOLO, R-FCN, SSD, to this year's CVPR newest yolo9000. These paper loss functions include a border regression, in addition to rcnn detailed introduction, the other paper are a stroke, or direct reference to rcnn the loss function is written out. The first three online explanations are more, the two later I read a lot of paper to come to these conclusions. Why do I need a border return. What is a border return. Border regression how to do. Why the border return is wide and the coordinates are designed in this form. Why the border regression can only be fine-tuned, in the ground truth near the time to take effect. Why do I need a border return.
Here quoted Wang Bin Senior's understanding, as shown in the following picture:
For the above illustration, the green box represents the ground Truth, and the red box is the region Proposal extracted by selective search. So even if the red box is identified by the classifier as a plane, but because the red box is not positioned (iou<0.5), then this picture is equivalent to not correctly detect the aircraft. If we can fine-tune the red box, so that after fine-tuning the window with ground Truth closer, so it will not be more accurate positioning. Indeed, Bounding-box regression is used to fine-tune the window. What the border returns to.
Continue to borrow brother's understanding: for the window general use four-dimensional vector (x,y,w,h) (x, Y, W, h) to represent, respectively, the center of the window coordinates and the width of the height. For Figure 2, the Red box p represents the original proposal, and the green box G represents the target's Ground Truth, and our goal is to find a relationship that allows the input of the original window p to be mapped to a return window that is closer to the real window G g^ \hat G.
The purpose of the border regression is both: given (PX,PY,PW,PH) (p_x, p_y, P_w, P_h) looking for a mapping F F, which makes F (px,py,pw,ph) = (gx^,gy^,gw^,gh^) f (p_x, p_y, P_w, p_h) = (\h At{g_x}, \hat{g_y}, \hat{g_w}, \hat{g_h}) and (gx^,gy^,gw^,gh^) ≈ (Gx,gy,gw,gh) (\hat{g_x}, \hat{g_y}, \hat{g_w}, \hat{G_h }) \approx (g_x, g_y, g_w, g_h) border regression how to do it.
So what transforms can change from the window P in Figure 2 to a window