is Faster r-cnn Doing well for pedestrian Detection?
ECCV Liliang Zhang & kaiming He
Original link: http://arxiv.org/pdf/1607.07032v2.pdf
Abstract: Pedestrian detection is argue said to be a specific subject, rather than general object detection. Although recent depth object detection methods such as: Fast/faster RCNN in general object detection, show a strong performance, but for pedestrian detection is not very successful. This paper studies the problems in pedestrian detection of Faster rcnn, and finds that RPN behaves well in a single pedestrian detector, but the subsequent classifier decreases the result. We suspect that there are two possible reasons for this:
1. For small objects, the resolution of feature maps is low;
2. Lack of any bootstrapping strategy to excavate hard negative examples.
Based on these observations, we present a simple but very effective baseline, using RPN after boosted forests in a shared, high-resolution convolution feature map (using an RPN followed by boosted forests on s hared, high-resolution convolutional feature maps). The experiment was done on 4 data machines, and the accuracy and speed were obtained.
motivation: with the recent automatic driving, intelligent monitoring of pedestrian detection more and more attention, but the current better pedestrian detectors generally use hybrid method, combined with traditional hand-designed feature and deep convolution feature. On the other hand, Faster rcnn has a good effect on the general object detection, and only uses the deep convolution feature, but not the traditional hand-designed feature, and the pedestrian detection data set is not effective!
The reasons for this may be two points:
First, convolutional feature maps are low-resolution for detecting small objects. Pedestrian detection in conventional scenarios, such as autonomous driving and intelligent monitoring, is small in size. For small objects, the RoI pooling layer may cause "plain" features on a low-resolution feature map. These feature are not distinguishable on small objects, so the performance of the classifier is reduced. We compare here, we can find that the hand-designed feature have better resolution. We use the hole algorithm to increase the size of the feature map by sampling feature from a more shallow, but high-resolution layer.
Secondly, in the pedestrian detection problem, the wrong prediction is generally caused by the confusion of hard background instances. Corresponding to this is the confusion source of general object detection is multiple categories. In order to solve the problems caused by these samples, the Cascade Boosted Forest (BF) was used to perform an effective hard negative mining (bootstrapping) and sample re-weight to classify RPN proposals. Unlike the previous method of training forest with hand-designed features, we re-use the RPN convolution feature to train. This strategy not only reduces the computational cost of classifiers by sharing feature, but also explores the features of deep learning.
Based on this observation, this paper presents a particularly simple but effective baseline, which is based on the mechanism of RPN and BF for pedestrian detection. Our approach overcomes both of these difficulties and is free from the traditional manual design feature.
Methods : The method in this paper consists of two parts: RPN is used to generate candidate boxes and convolution feature maps, and Boosted Forest uses these convolution features for classification.
1. RPN for pedestrian Detection
This and Faster rcnn seems to be the same, I do not explain;
2. Feature Extraction:
Based on the proposals generated by RPN, we use RoI pooling to extract fixed-length feature in the region. These feature can be used to train BF.
Paper reading: Is Faster r-cnn Doing well for pedestrian Detection?