The frame and process of computer vision target detection

Last Update:2018-07-25 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transferred from: http://blog.csdn.net/zouxy09/article/details/7928771

General Framework for detection of targets:

Target detection is divided into the following steps:
1, training the classifier needs training sample creation:
The training samples include positive samples and negative samples, in which the sample is a sample of the target to be checked (e.g., face or car), negative sample refers to any other image (such as background, etc.) that does not contain a target, and all sample images are normalized to the same size (for example, 20x20).
2. Feature Extraction:
The amount of data obtained by the image or waveform is quite large. For example, a text image can have thousands of data, and an ECG waveform may have thousands of of data. In order to realize the classification recognition effectively, we must transform the original data to get the characteristic that can reflect the essence of the classification. This is the process of feature selection and extraction. In general, we call the space of the original data as the measuring space, the space that the classification identifies is called the characteristic space, through the transformation, the pattern represented in the measurement space with the higher dimension is changed to the pattern represented in the feature space with the lower dimension.

3. Use training samples to train the classifier:

This must first understand what the classifier is. The explanation of Baidu Encyclopedia is: "The classification device or mathematical model used to make the object be classified into a certain category." "I think I can understand that, for example: The human brain itself is a classifier (just as powerful as it is beyond imagination), and the recognition of things is a process of classifying themselves." In the process of growing up or learning, people will get to know the nature and characteristics of a type of things by observing multiple concrete cases of a type of things, and then when encountering a new object, the human brain will classify it as a class A or non-class according to the nature and characteristics of the thing. (This is explained by a simple two classification question). Then the training classifier can be understood as the classifier (brain) through the observation of positive samples and negative samples (learning), so that it has the ability to detect the target (in the future to meet the target can be recognized).

From the mathematical expression, the classifier is a function y=f (x), X is a feature of a thing, Y is a category, the popular saying is, for example, you enter the characteristics of Zhang San X1, the classifier will give you to recognize this is Zhang San Y1, you input John Doe features x2, it will give you to recognize this is John Doe Y2. So the classifier is a function, what is its mathematical model? Once function y=kx+b. The higher-secondary function. And so it's complicated, we need to determine its model first, and after the model is determined, does the model have many parameters? For example, the above function y=kx+b K and B, the mean value and variance of the Gaussian function, etc. This can be determined by what minimize the classification error, minimize the punishment ah and so on, in fact, training classifier seems to be looking for these parameters, so as to achieve the best classification effect. Oh, I don't know if I'm right.
In addition, in order to make the classification detection accuracy better, training samples are generally thousands of, and then each sample extracted a lot of features, so that a lot of training data, so the training process is generally very time-consuming.

4, using a well-trained classifier for target detection:

The classifier can be used to classify the image you have entered, that is, to detect the presence of the target you want to detect in the image. The general testing process is this: with a scanning sub-window in the image to be detected in the constant shift of sliding, sub-window each to a position, will calculate the characteristics of the area, and then use our trained classifier to filter the feature, to determine whether the region is the target. Then because the target size in the image may not be the same size as the sample image you used to train the classifier, you need to make the scanned sub-window larger or smaller (or smaller), then slide in the image and match again.

5. Learning and improving classifiers

Now if the sample number is more, feature selection and classifier algorithm are better, the detection accuracy of the classifier is very high. But it will also be the wrong time to check. So the more advanced point is to join the study or self-adaptation, that is, you put this picture classification error, I will take this picture out, labeled its correct category, and then put in the sample library to train the classifier, let the classifier update, Wake up, next time don't give me a mistake. How did you know he was mistaken? I understand that most of this is determined by a priori knowledge (such as the existence of a structure or whatever constraint on the target itself) or by a combination of tracking (where the goal is generally not moving too fast).

In fact, the above pattern classification process is suitable for many fields, such as image ah, speech recognition and so on. So what is the key point of this whole process?

(1) Feature selection:

The feeling target is more prevalent:Haar characteristic ,lbp characteristic ,hog characteristic and shif characteristic , etc. they have their own merits, depending on the target you want to test, such as:
Fist: Texture features obvious: Haar, LBP (at present, it is combined with hog);
Palm: Contour features obvious: Hog characteristics (pedestrian detection generally use this);
(In the blog, I will refer to the various cattle blog and data to organize Haar features, LBP features, hog features and shif characteristics, such as the content, see blog update)
(2) Classifier algorithm:
The sense target is more prevalent:SVM support vector machine ,AdaBoost algorithm , and so on, in which the detection of pedestrian is generally hog features +SVM,OPENCV detection of human face is generally haar+adaboost, OpenCV in the detection of fists is generally lbp+ AdaBoost;

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More