One of the target detection (traditional algorithm and deep learning source learning)
This series of writing about target detection, including traditional algorithms and in-depth learning methods will involve, focus on the experiment and not focus on the theory, theory-related to see the paper, mainly rely on OPENCV.
First, what is the target detection algorithms, Historical introduction
Recently in doing some target detection related things, target detection is one of the most important problems in computer vision, many occasions detection and recognition are very important, such as now very fire unmanned, very dependent on target detection and recognition, the need for very high detection accuracy and positioning accuracy.
Target detection has been started from a very early age.
Typical representatives of traditional algorithms are:
Haar feature +adaboost algorithm
Hog feature +SVM algorithm
DPM algorithm
The typical representative of the target detection in depth learning is:
RCNN Series, RCNN,SPP-NET,FAST-RCNN,FASTER-RCNN
Yolo Series, YOLO and YOLO9000
Ssds
Later, there is a deep residual network resnet, and then appeared RFCN, and the recent mask-rcnn and so on, the detection effect is getting better and higher precision.
Detection of characteristic +adaboost features of Haar
As the first installment of this series, let's start with a simple, Haar feature +adaboost algorithm. The principle is simple. There are a lot of tutorials on the web, and I'm not going to talk about it a little bit.
Haar features include the following:
AdaBoost algorithm is a machine learning an integrated learning algorithm, said popular point, is the use of weak classifier (classification ability is poor, But also to be greater than 0.5 cascade weighted combination of strong classifier (strong classification ability), in the training process will focus on training in the previous classification error samples, and the specific method is to increase the weight of the sample corresponding.
This experiment test target is the vehicle
Target detection using the HAAR+ADABOOST algorithm is divided into three steps:
1, the creation and marking of samples
2. Training classifier
3, the use of a good training classifier for target detection.
1. Creation and marking of samples
Sample yourself is a very painful and troublesome thing, it is best to go to the Internet to find some public data sets, after all, like Imagenet race or unmanned so fire, open data set a lot of.
Here are a few vehicle detection related data set Links:
Http://www.gti.ssr.upm.es/data/Vehicle_database.html
Http://www.cvlibs.net/datasets/kitti/raw_data.php?type=city
Positive sample, that is, a picture of the object, the use of image marking tools, online Search a lot of, format is, picture name + target number + target of the rectangular frame positioning (upper left corner coordinates and rectangle width)
A positive sample describes the creation of a file Vec file.
The target situation has been recorded in TXT file, open the cmd window, input createsamples.exe-info positive/info.txt-vec data/vector.vec-num 500-w 24-h 24. Of course, you can also run with a. bat file. This sentence's-num 500 indicates the number of positive sample pictures, followed by-W and-H said the picture resize size, according to the actual situation. The Vector.vec file is generated after running, and this is the vector description file. You do not have to open to see its content, in fact, open also useless, because it is garbled, the need for specialized software. We'll use that later.
Do this actually successful half, making the sample is very troublesome. Let's see how to make and produce negative samples. Very simple, prepare the picture (does not contain the image of the car) 1500, more points can also.
Production of negative samples
Then in the current path in the cmd window run dir/b *.jpg >neg_name.txt will generate a neg_name.txt file, which contains the current path of all JPG file names.
Well, the positive and negative samples are finished and you can start training. We train by using the OpenCV Opencv_haartraining.exe file (which is available under the Bin directory of the OpenCV installation directory).
The parameters look a lot, a little complicated. Do not care about it, the Internet to check on the clear, many parameters have default values. When I was training, the orders were
Opencv_haartraining.exe-data Data/cascade-vec data/vector.vec-bg negative/neg_name.txt-npos 500-nneg 1500-nstages 20 -mem 4000-w 24-h 24
In turn, the executable file name, the training of the XML classifier file to save the address, positive sample description file Vec file, negative sample file name, positive and negative sample number, nstages for training wheels, mem for allocating memory MB, image resize size.
Training screenshot
This training process is very slow, it may take more than 10 20 hours, looking at the computer Configuration and data volume, I was trained for more than 20 hours. You can keep it trained until it's done, and you can interrupt your training at any time, and start at any time, and he'll go on training and not start all over again.
After a long wait training to complete the XML classifier file, and then use the OpenCV interface can be used for vehicle detection, I am using detectmultiscale this function detection, the same as face detection, and then output rectangular box. I directly posted the detection part of the source code, the other parts are directly using OPENCV. Actually this also count ~ ~
#include <opencv2\opencv.hpp> #include <opencv2\core\core.hpp> #include <opencv2\highgui\\
Highgui.hpp> #include <iostream> using namespace std;
using namespace CV; int main () {string xmlpath = "Car_model.xml";//training good classifier XML file Cascadeclassifier CCF;
Create a Classifier object Mat img;
if (!ccf.load (Xmlpath))//Loading training file {cout << "cannot load specified XML file" << Endl;
return 0;
} namedwindow ("Car");
bool stop = FALSE;
Get camera Image//videocapture cap (0);
Get folder picture sequence string img_path = "data";//put the picture in the current directory of the Data folder, the filename can be vector<cv::string> vec_img;
Glob (Img_path, vec_img);
if (Vec_img.empty ()) {std::cout << "there is no pictures." << Endl;
return-1;
//write the video file, I deal with the sequence of pictures, so I write the results of the test as a video file, easy to observe//videowriter writer;
String video_name = "Car1.avi";
Mat temp = Imread (vec_img[0]); int frame_fps = 15;
Frame rate//writer = Videowriter (Video_name, CV_FOURCC (' X ', ' V ', ' I ', ' D '), Frame_fps, Size (Temp.cols, temp.rows)); while (!stop) {Int64Start = GetTickCount ();
for (int i = 0; i < vec_img.size (); ++i) {img = Imread (vec_img[i));
if (!cap.read (IMG))//break;
Caps >> img; Vector<rect> cars;
Create a container to save the detected vehicle Mat Gray; Cvtcolor (IMG, Gray, Cv_bgr2gray); Convert to grayscale because the Harr feature extracts//equalizehist (gray, Gray) from a grayscale image; Histogram equalization Row Ccf.detectmultiscale (gray, Cars, 1.1, 3, 0, size (ten), size (100, 100));
Test Vehicle//cout << cars.size () << Endl; for (Vector<rect>::const_iterator iter = Cars.begin (); Iter!= cars.end (); iter++) {Rectangle (img, *iter, Sc Alar (0, 0, 255), 2, 8);
Draw the rectangle imshow ("Car", IMG);
Writer.write (IMG);
if (Waitkey (2) = = ' Q ')//press Q to exit {stop = true;
Break
Compute run time cout << (GetTickCount ()-start)/gettickfrequency () << Endl;
Break
return 1; }
Test Effect screenshot: Can see for simple scene test, the test effect is also good, the speed is very fast, dozens of frames per second, almost all can be correctly detected, but for complex point scene detection effect is very bad. This is the time to look at the depth of learning.
Test Scenario 1: Highways, fewer targets, 1700 sequence pictures (320*240), only need about 25s.
This is the scene under the monitoring camera of the freeway, from the above two images can be seen, the distant vehicles are almost not detected, when the vehicle near the camera, can be accurately detected.
The detection effect is better under the simple scene, and the vehicles in the scene can be detected almost accurately.
Test Scenario 2: City streets, because of the complexity of light and scene, the detection effect is very poor, can hardly detect
The depth study method comparison, almost all can correctly detect, at present uses the Yolo method to carry on the vehicle detection, the speed and the precision are comparatively good, is suitable for the video real-time detection. 1700 Frame sequence Image (320*240), spents around 600s, equivalent to about 1 seconds and 3 frames, a bit slow
Note: Due to more vehicles, I have the target label and probability removed, only the detection box.
As can be seen from the above picture, the overall test vehicle effect is good, but the third picture of the time there was a wrong detection, it identified the road as train. There is also a fourth image, because the target is more intensive, the detection box positioning is not accurate
In more complex scenarios, the depth-learning approach presents a greater advantage, and the target in the scene is not susceptible to detection due to illumination, but YOLO is almost entirely accurate.
See the second article in this series for the object detection of YOLO. Some questions welcome in the comment area or the mail exchange, I am also the beginner, has the wrong place also hoped that everybody does not hesitate to correct ~ ~
Reference documents
[1] violap, Jones M. Rapid object detection using a boosted cascade the simplefeatures[c]//Computer Vision and pattern Rec Ognition, 2001. CVPR 2001.Proceedings of the 2001 IEEE Computer Society Conference on. IEEE, 2003:I-511-I-518VOL.1.
[2] Lienhart R, Maydt J. An extended set of Haar-like features to Rapidobject detection[c]//International on Image conference. 2002.Proceedings. IEEE, 2002:i-900-i-903 vol.1.
[3] http://blog.csdn.net/zhuangxiaobin/article/details/25476833
[4] RedMon J, Farhadi A. Yolo9000:better, Faster, stronger[j]. 2016.