Ren, Shaoqing, et al. "Faster r-cnn:towards Real-time object detection with region proposal networks." Advances in neural information processing Systems. 2015.
After Rcnn[1],fast Rcnn[2], this article is another masterpiece of the Ross Girshick team, the leader of the target detection community in 2015. The detection speed of simple network target is 17fps, the accuracy of Pascal VOC is 59.9%, the complex network reaches 5fps, the accuracy rate is 78.8%.
The author gives the source code based on MATLAB and Python on GitHub.
In the previous section we introduced to Fast R-CNN network, fast r-cnn seems to be perfect, but there is a problem in fast r-cnn, it needs to use the selective search extract box, this method is relatively slow, sometimes, detect a picture, Most of the time is not spent on computational neural network classification, but on selective search extraction box! In the fast R-CNN upgrade faster R-CNN, the proposal Search was replaced with the RPN region Eslective Network, and the speed was greatly improved, and more accurate results were obtained.
A Faster r-cnn thought
From r-cnn to fast r-cnn, to faster r-cnn in this paper, four basic steps of target detection (candidate area generation, feature extraction, classification, location refinement) are finally unified into a deep network framework. All calculations are not duplicated and are fully completed in the GPU, greatly improving the speed of operation.
Faster R-CNN can be simply regarded as a "zone Generation Network (RPN) +fast rcnn" system, with the region generation network instead of the R-CNN search method in Fast selective, the network structure such as. This paper focuses on solving three problems in this system:
- How to design a zone generation network
- How to train a zone to build a network
- How to make zone generation network and Fast R-CNN network share feature extraction Network
The steps are as follows:
- First, the CNN network "ZF or VGG-16" to enter any size picture;
Through the CNN network forward to the last shared convolution layer, on the one hand to the RPN network input feature map, on the other hand continue to propagate forward to the specific convolution layer, resulting in higher dimensional feature map;
- The feature map for RPN network input has been RPN network to obtain the region suggestion and the region score, and the region score adopts the non-maximum value suppression "threshold value is 0.7", the output of its top-n "300" score in the region proposed to the ROI pool layer;
- The 2nd step obtains the high dimensional characteristic graph and the 3rd step output region proposed simultaneously inputs the ROI pool layer, extracts the corresponding region suggestion characteristic;
- The 4th step is to get the regional recommendations after the full connectivity layer, the output of the region's classification score and the Bounding-box after the regression.
Two RPN detailed
The basic assumption is that all possible candidate frames are judged on the extracted feature map. The candidate boxes are actually sparse because of the location refinement steps that follow.
1. Feature Extraction
RPN still needs to use a CNN network to extract features from the original image. For the convenience of the reader, it is advisable to set this front-facing CNN extract features 51x39x256, that is, the height is 51, the width is 39, the number of channels is 256. The convolution feature is calculated again, keeping the width, height and channel number constant, and getting a 51x39x256 feature again.
To facilitate the narrative, first define the concept of a "position": for a 51x39x256 convolution feature, it is called a total of 51x39 "position". Let each "position" of the new convolution feature be "responsible" for the detection of 9 size boxes in the corresponding position in the original image, the goal of the detection is to determine if there is an object in the box, so the 51x29x9 "box" is shared. In faster R-CNN's original paper, these boxes are collectively called "anchor".
2. Candidate area (anchor)
Features can be seen as a 256-channel image of a scale 51*39, for each location of the image, consider 9 possible candidate Windows: Three areas are 128x128,256x256,512x512, each area is divided into 3 aspect ratios, respectively 2:1,1:2,1:1. These candidate windows are called anchors. Shows the 51*39 Anchor Center, as well as 9 examples of anchor.
For this 51x39 location and 51x39x9 anchor, the calculation steps for each of the next locations are shown. Set k for a single position corresponding to the number of anchor, at this time k=9,. By adding a 3x3 sliding window operation and two convolution layers to complete the area recommendation function, the first convolution layer encodes each sliding window position of the feature map into a eigenvector, and the second convolution layer corresponds to the output K-region score for each sliding window position, indicating the probability that the anchor of the position is the object, This part of the total output length is 2xk (a anchor corresponds to two outputs: the probability of the object + not the probability of the object) "and K after the return of the area recommendations" box regression, the meaning of box regression and fast r-cnn, a anchor corresponding to 4 box regression parameters, So the total output length of the box regression part is 4xk ", and the output score of the score area is suppressed after the Top-n" 300 "area, tell the detection network should pay attention to which areas, in essence, to achieve the selective Search, edgeboxes and other methods of function.
Faster R-CNN uses RPN to generate candidate boxes, the rest of the network structure is identical to the structure in fast r-cnn. In the training process, you need to train two networks, one is RPN network, and the other is the classification network used after getting the box. The usual practice is to alternate training, that is, in a batch, training RPN network once, and then train the classification network once.
Reference article: Faster r-cnn thesis detailed Faster rcnn analysis "target detection" Faster RCNN algorithm
Faster r-cnn:towards Real-time Object Detection with region proposal Networks
30th, Faster R-CNN algorithm of target detection algorithm