Object detection using HOG+SVM (gradient direction histogram and support vector machine)

Source: Internet
Author: User
Tags svm svm tutorial

Recently made use of HOG+SVM to do a small program of object detection, you can first look at the results of the experiment. From the photo, the doll was detected in any position in any gesture. (In fact, the plan is to test the red Big doll, but the small doll has also been detected out, as to why this and the problem of the solution, we can continue to discuss below)


Actually, the online tutorials and books on hog and SVM are also very many. But there are few articles or blogs that allow beginners or people who don't know much about the content to read. Anyway, I was reading a lot of blogs, articles, and find a program to do, can probably understand the specific procedures of the operation of the process. Therefore, I focus on doing a check leakage of the work, I will often ignore the details of others in the article, but also is not easy for beginners to understand the place, selected out, fully smoothed a stroke.

First of all, hog, Chinese name is gradient direction histogram. This online tutorial a lot, I found a good share for everyone: Hog tutorial

The tutorial is very detailed, and there are some mathematical formulas, just beginning to look very likely a face. But it doesn't matter, at first you just have to understand the key questions. The rest of the details can be slowly pondered later.

Hog words, the equivalent of using a sliding window, in the image from left to right, from top to bottom slide, every slide to a position, calculate its hog information, focus to understand, this hog information is actually a 1*n matrix, plainly is a vector, this vector represents the information in this window.

In the object detection aspect, this window has or does not have I to examine the object, its hog vector should have the different. But how to compare different, very is willing to can be hundreds of dimensions or even thousands of dimensions of the vector, this time with SVM is a good choice.

Next to say SVM, also called support vector machine, specific tutorials can be seen here: SVM Tutorial

In the same way, there are a lot of mathematical formulas, it doesn't matter if you don't understand. SVM is a binary classifier which can classify vectors into two values.

In a way, give it a bunch of vectors, tell it it's all a type, give another bunch of vectors, tell it that these vectors are type B, and then let SVM train themselves. It can train a good method that allows you to give it a vector and it tells you whether the vector is a type A or B.

Speaking of which, the principle of HOG+SVM testing objects is basically clear. The detailed process will be described again, in this paper to test the doll as an example:

1. First you need to determine the size of the Hog detection window, where the 64*64 pixel size box is used as the detection window.

1. In order to get sufficient training for SVM, a large number of positive and negative samples should be prepared first. Here I collected a picture of the doll under each gesture, a total of more than 3,000 pieces, these samples are positive samples, and are 64*64 pixel size. As shown in figure:



2. The same, but also need negative samples, in the field without the doll at random photos, the size is also required when the 64*64, a total of 150,000 negative sample pictures, negative samples of the picture inside are not the doll. (In fact, I stole a lazy, I took a lot of photos with high-resolution cameras, and then in these photos to intercept the size of 64*64 pixel block as a negative sample, so that a common picture can intercept hundreds of negative sample pictures, for the collection of data more convenient, as shown:


3. Then, the positive and negative samples are fed into the SVM for training, and a positive or negative sample can be detected. (Looking at the amount of data, hundreds of thousands of pictures, in fact, the speed is good, I use a common computer, the training process has been used for about 10 minutes. In fact, this is just an example, the actual training, the number of training data is not enough.

4. Then you need to detect the objects in the picture. The 64*64 sliding window slides on the image, each slide to a position, calculates its hog characteristics, and then compares the results produced by the SVM. If the box contains the object we are looking for, then we can theoretically classify the hog feature as a positive sample, which completes the detection of the object.

Here's a little bit of detail about the size of the objects in the picture, how to detect objects of different sizes with a fixed size window. In fact, the truth is, the size of the sliding window does not change, but the scale of the picture can be enlarged and reduced. So the actual operation, the window size is always 64*64, but the image zoom to a number of resolution to detect, you can find different locations, different sizes of objects.

5. Because the sliding window moves in a very short stride, an object is likely to be detected many times, and the detected boxes will be densely packed with the same object. Therefore, it is necessary to put these different boxes of the same object together as one. It is mainly based on the distance between the boxes and the size of the box to merge. After the final merge, just like the following image:


The results are then analyzed:

Training, the red Big Doll as a positive sample, the site does not contain dolls of some random scene pictures as a negative sample training, theoretically speaking, the ideal effect should be able to detect the designated red doll. But the actual result is that the big red doll and the small blue doll have been detected (sometimes there is also the possibility of the venue where there is no doll to detect a doll), this means that the hog features of the Red Big Doll are somewhat similar to the hog characteristics of the blue doll, and in order to avoid this situation, we can train again Training time, the Blue Doll's picture as a negative sample input training set, the same, for other objects do not detect the object, also as a negative sample input, will have the doll but did not detect the place, also manually marked as a positive sample input training set. Such a process can be repeated several times. After this repetitive training, and the correction of positive and negative samples, can greatly improve the accuracy of detection.

And, finally, attach the code to: (in the code, the part of the annotation above is the code to train the SVM, the following part is to use the training of the SVM model for object detection)








#include <iostream>//using namespace std; int main ()//{//cout<< "Hello World" <<endl;//Return 0//}//#include "cui_hog_slow.h"//#include "Cui_hog. H "#include <ctime> #include <iostream> #include <fstream> #include <opencv2/core/core.hpp> # Include <opencv2/highgui/highgui.hpp> #include <opencv2/imgproc/imgproc.hpp> #include <opencv2/
Objdetect/objdetect.hpp> #include <opencv2/ml/ml.hpp> using namespace std;

using namespace CV; #define SQU 64//Box side length #define POSSAMNO 3513//Positive sample number #define NEGSAMNO 157160//Negative sample number #define TRAIN true//Yes No training, true for retraining, false for reading the SVM model in an XML file #define Central_crop true//true: When training, the Inria positive sample of 96*160 is clipped to the middle 64*128 size body/ Hardexample: Number of negative samples.
If the Hardexampleno is greater than 0, the Hardexample negative sample set continues to be processed after the initial negative sample set is processed. The hardexample must be set to 0 when the dimension of the eigenvector matrix and the feature class matrix is initialized #define HARDEXAMPLENO 0//inherits from the CVSVM class, because generating Setsvmdetector () Used in the detection of child parameters, you need to use the training of the Decision_func parameters of SVM,//But by looking at the CVSVM source Decision_fThe UNC parameter is a protected type variable that cannot be accessed directly to, only after the function access class Mysvm:public CVSVM {public:///Get the alpha array in the decision function for SVM double * Get_alpha
  _vector () {return this->decision_func->alpha;
  //Get the Rho parameter in the decision function of SVM, that is, the offset float Get_rho () {return this->decision_func->rho;




}
}; int main () {///Detection window (96,96), block size (16,16), block Step (8,8), cell size (8,8), Histogram bin number 9//Hogdescriptor Hog (Size (squ,squ), size (16,16 ), size (8,8), size (8,8), 9),//hog detector, which computes the dimension of the Hog descriptor/INT Descriptordim;//hog descriptor, determined by the size of the picture, the size of the window, the size of the block, and the number of bins in the cell cell. MYSVM SVM;//SVM Classifier///If TRAIN is true, re-train classifier/if (TRAIN)//{////string imgname;//picture name (absolute path)//Ifstream Finpos ("INRIAPerson96X160PosList.txt");///////////////////////ifstream Finpos ("PersonFromVOC2012List.txt"); FStream Finneg ("NoPersonFromINRIAList.txt")//Negative sample picture file name list//Mat samplefeaturemat;//all the training samples of the eigenvectors of the Matrix, the number of rows equals all the number of samples, The number of columns equals the class vector of the Hog//Mat samplelabelmat;//Training sample, the number of rows equals the number of all samples, the number of columns equals 1, and 1 indicates that someone,-1, reads the positive sample picture, generates the Hog descriptor////for (int num=1; num< (possamno+1) && getline (finpos,imgname); num++)//for (int num=0; num<possamno; num  
	+ +)//{//String imgname;//picture name (absolute path)//StringStream stream;  
	stream<< (num+1); 
 Imgname=stream.str ();
 cout<< "Processing:" < Descriptors;//hog Description Sub vector//Hog.compute (Src,descriptors,size (8,8));//COMPUTE Hog Descriptor, Detect window Move step (8,8)

 cout<< "Description Child Dimension:" <<descriptors.size () <<endl; SecretariatWhen the first sample is taken, the eigenvector matrix and the class matrix are initialized, because only the dimension of the eigenvector is known to initialize the eigenvector matrix//if (0 = num)//{//Descriptordim = Descriptors.siz E ();//hog the dimension of the descriptor/////Initialize the matrix of the eigenvector of all the training samples, the number of rows equals the number of all samples, the number of columns equals the hog of the Description child Samplefeaturemat//Samplefeaturemat = Mat:
 : Zeros (Possamno+negsamno+hardexampleno, Descriptordim, CV_32FC1); Initializes the class vector for the training sample, the number of rows equals the number of all samples, the number of columns equals 1; 1 indicates that someone, 0 indicates that no one//Samplelabelmat = Mat::zeros (Possamno+negsamno+hardexamplen
 O, 1, CV_32FC1); /////Copy the computed hog descriptor to the sample feature matrix Samplefeaturemat//for (int i=0; i<descriptordim; i++)//Samplefea Turemat.at<float> (num,i) = descriptors[i];//The first element of the eigenvectors of the NUM sample//samplelabelmat.at<float> (num,0) = 1;/ ME);  
	num++)//for (int num=0; num<negsamno; num++)//{//String imgname;//picture name (absolute path)//StringStream stream;  
	stream<< (num+1); 
 Imgname=stream.str (); //    cout<< "Processing:" < Descriptors;//hog Description Sub vector//Hog.compute (Src,descriptors,size (8,8));//COMPUTE Hog Descriptor, Detect window Move step (8,8)

 cout<< "Description Child Dimension:" <<descriptors.size () <<endl; Copy the computed hog descriptor to the sample feature matrix Samplefeaturemat//for (int i=0; i<descriptordim; i++)//samplefeaturemat.at& Lt;float> (num+possamno,i) = descriptors[i];//the first element//samplelabelmat.at<float> in the eigenvector of the Possamno+num sample (num +possamno,0 = -1;//Negative Sample category is-1, unmanned////////processing hardexample negative samples////if (Hardexampleno > 0)////{///IFS Tream finhardexample ("Hardexample_2400posinria_12000neglist.txt");//hardexample negative sample file Name list///// Read Hardexample negative sample picture sequentially, generate hog//for (int num=0; num 


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.