Traditional methods:
The traditional target detection uses a sliding window frame, which decomposes a graph into millions of sub-windows of different scales, and uses the classifier to determine whether the target object is included for each sub-window. Traditional methods for different categories of objects, will generally design different features and classification algorithms. Like what:
The classic algorithm for >> Face Detection (detetion) is the haar feature + adaboost classifier
The classic algorithm for >> pedestrian detection is Hog (histogram of gradients) + SVM (support vector machine)
>> General Object Detection Hog + DPM (deformable part model)
Deep Learning-based approach:
Pre-extracting a range of candidate areas that are more likely to be objects, and then only extracting features from these candidate areas (using CNN). This kind of algorithm is mainly RCNN series, including RCNN, fast rcnn, faster rcnn and so on. The approximate flow of the algorithm is:
1. Candidate area: From the image to be detected, extract n Roi, where n is much larger than the number of real objects in the picture. Specific methods include selective search, edge box, and the recently popular RPN.
2. Feature extraction: Based on the ROI detected in 1, the image is feature extraction on CNN.
3. Category judgment: Classification of the feature obtained in 2, for example, psacal VOC data, is a 21 classification problem (20 object Class+background).
4. Location Repair: Boudningbox regression.
Other methods:
End-to-end (End-to-end) object detection, such as YOLO (you once) and SSD (single shot Multibox Detector). These algorithms claim to be similar to faster rcnn accuracy but faster.
Google in the second half of 2016, a paper, a detailed comparison of faster r-cnn, R-FCN and SSD performance indicators, or is well worth reading. speed/accuracy trade-offs for modern convolutional object Detectors
The image above is from Google's paper. A more sketchy conclusion is that the first class of frameworks (Faster R-CNN) will perform better, but slower, while the second class of frameworks (SSDs) perform slightly less, but faster.