With the rise of artificial intelligence, target detection algorithm plays a more and more important role in various industries, how to land, this is a very serious topic. Today I saw a Daniel share, study.
To comb the algorithm and history of this field. Facilitate follow-up studies.
According to the time classification, the algorithm can be divided into two kinds: traditional algorithm and CCN algorithm.
Traditional algorithms:
- Cascading classifier Frame: haar/lbp/integral hog/acf feature+adaboost
The Cascade classifier was first proposed by Paul Viola and Michael J. Jones in CVPR 2001.
In fact, this is boosting by simple weak classification of the process of assembling strong classifiers, now looks very low, but this algorithm for the first time to make the target detection become a reality!
OPENCV has a classic implementation of cascading classifiers:https://docs.opencv.org/2.4.11/modules/objdetect/doc/cascade_ Classification.html?highlight=haar
As for the use of the features, Haar simple enough, LBP is no need to go to the steak ...
As for HOG/ACF, here's the word.
Histograms of oriented gradients for human DETECTION,2005,CVPR
Because the original Haar feature is too simple, it is only suitable for rigid object detection and cannot detect non-rigid targets such as pedestrians, so the HOG+SVM structure is proposed.
Also implemented in OpenCV:https://docs.opencv.org/2.4.11/Modules/gpu/doc/object_detection.html?highlight=hog
After another demon changed a series of features such as Log/dog/rog, there is no point to say more.
It is worth mentioning that some people have changed the HoG in SVM to integral HoG for cascading classifiers. This is the prototype of the integral hog of the current OPENCV cascade classifier:
Integral histogram:a Fast Extract histograms in Cartesian Spaces
Follow-up has developed a aggregate Channel Feature (ACF) and other characteristics, the paper is mainly the following 2:
Aggregate Channel Features for Multi-View face DETECTION,2014,IJCB
Fast Feature Pyramids for Object Detection,2014,pami
The bright spot is this fast, speeding up the calculation of the integral hog, the effect is good and fast, still active in the embedded field.
- Discriminatively trained deformable part models (DPM)
Project homepage:http://www. Rossgirshick.info/latent/
DPM uses a spring model for target detection, such as. That is, multi-scale + multi-site detection, the underlying image feature extraction is fhog. It is sensation anyway.
The follow-up also has dpm+/dpm++, does not have the meaning not to mention.
- Template matching: It is the technology that looks for the most closely matched (i.e. most similar) part of the image in one image. Reference to the relevant implementation: Https://www.cnblogs.com/skyfsm/p/6884253.html
User-aware
Links: https://www.zhihu.com/question/53438706/answer/148973444
Source: Know
Copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please specify the source.
CNN Method:
- Based on region proposal (stage): R-cnn family, including faster R-cnn/mask R-CNN/RFCN
However, DPM fire not to 2 years, the R-cnn family appeared, and finally no longer with a variety of magic revision hog features to detect!
One of the most representative of the R-cnn family is faster r-cnn. Faster R-CNN by RPN Network first generated region proposal, and then proposals to the region, is called the second stage.
Faster r-cnn:towards Real-time Object Detection with region proposal Networks
Mask R-CNN
In fact, R-CNN series detection concern him: kaiming He-fair, completely enough.
- Based on regression (one-shot): YOLO/YOLO2/SSD/DSSD
Yolo and SSDs are produced proposal while classification+regression, one-time completion, that is, the so-called one-shot. Compared to the speed advantage of the stage, Precision/recall is slightly lower.
SSD:SSD:Single Shot Multibox Detector
As for YOLO, there are currently YOLO V1,yolo 9000 (v2), YOLO v3
Yolo Project homepage (including paper)
In addition, I think, the subsequent version of DSSD and Yolo V2/v3 really no difference between, feel the same.
This also shows that the detection has tended to bottleneck, no algorithm breakthrough is difficult like before, a little increase of dozens of points.
- Special text sequence Detection: CTPN (LSTM + r-cnn)/seglink (SSD magic Change)
In addition to the general sense of detection, there is a class of text detection, used for OCR before the text positioning. This kind of detection and general detection is a little bit different. At present 2 kinds of good effect: CTPN and Seglink
Faster r-cnn Inheritance: CTPN horizontal or vertical text detection
Detecting text in Natural Image with connectionist text proposal Network, ECCV, 2016.
Code TIANZHI0549/CTPN
SSD Inheritance: Seglink tilt Text detection
Detecting oriented Text in Natural Images by linking segments,cvpr,2017
Code https://github.com/dengdan/seglink
Of course, the word detection algorithm also has a traditional, such as this OPENCV self-brought:
Real-time Scene Text Localization and recognition, CVPR 2012
But it is not recommended to toss, no need.
Summarize:
The advantage of traditional method is fast, even in the embedded platform can achieve high-speed real-time , the disadvantage is that precision/recall are not very ideal, simply said that the effect is poor;
The advantage of CNN method is that the precision/recall is much better, and the disadvantage corresponds to the slow speed.
At present, in the embedded, the traditional algorithm still has some space, but is squeezed by the mobilenet and so on, in the server side, is completely the deep network world.
Reference Documentation:
1 https://www.zhihu.com/question/53438706
2 Https://zhuanlan.zhihu.com/ML-Algorithm
The history and classification of target detection algorithm