The frame and process of computer vision target detection

Source: Internet
Author: User
Tags svm

Transferred from: http://blog.csdn.net/zouxy09/article/details/7928771

General Framework for detection of targets:



Target detection is divided into the following steps:
1, training the classifier needs training sample creation:
The training samples include positive samples and negative samples, in which the sample is a sample of the target to be checked (e.g., face or car), negative sample refers to any other image (such as background, etc.) that does not contain a target, and all sample images are normalized to the same size (for example, 20x20).
2. Feature Extraction:
The amount of data obtained by the image or waveform is quite large. For example, a text image can have thousands of data, and an ECG waveform may have thousands of of data. In order to realize the classification recognition effectively, we must transform the original data to get the characteristic that can reflect the essence of the classification. This is the process of feature selection and extraction. In general, we call the space of the original data as the measuring space, the space that the classification identifies is called the characteristic space, through the transformation, the pattern represented in the measurement space with the higher dimension is changed to the pattern represented in the feature space with the lower dimension.

3. Use training samples to train the classifier:

This must first understand what the classifier is. The explanation of Baidu Encyclopedia is: "The classification device or mathematical model used to make the object be classified into a certain category." "I think I can understand that, for example: The human brain itself is a classifier (just as powerful as it is beyond imagination), and the recognition of things is a process of classifying themselves." In the process of growing up or learning, people will get to know the nature and characteristics of a type of things by observing multiple concrete cases of a type of things, and then when encountering a new object, the human brain will classify it as a class A or non-class according to the nature and characteristics of the thing. (This is explained by a simple two classification question). Then the training classifier can be understood as the classifier (brain) through the observation of positive samples and negative samples (learning), so that it has the ability to detect the target (in the future to meet the target can be recognized).

From the mathematical expression, the classifier is a function y=f (x), X is a feature of a thing, Y is a category, the popular saying is, for example, you enter the characteristics of Zhang San X1, the classifier will give you to recognize this is Zhang San Y1, you input John Doe features x2, it will give you to recognize this is John Doe Y2. So the classifier is a function, what is its mathematical model? Once function y=kx+b. The higher-secondary function. And so it's complicated, we need to determine its model first, and after the model is determined, does the model have many parameters? For example, the above function y=kx+b K and B, the mean value and variance of the Gaussian function, etc. This can be determined by what minimize the classification error, minimize the punishment ah and so on, in fact, training classifier seems to be looking for these parameters, so as to achieve the best classification effect. Oh, I don't know if I'm right.
In addition, in order to make the classification detection accuracy better, training samples are generally thousands of, and then each sample extracted a lot of features, so that a lot of training data, so the training process is generally very time-consuming.

4, using a well-trained classifier for target detection:

The classifier can be used to classify the image you have entered, that is, to detect the presence of the target you want to detect in the image. The general testing process is this: with a scanning sub-window in the image to be detected in the constant shift of sliding, sub-window each to a position, will calculate the characteristics of the area, and then use our trained classifier to filter the feature, to determine whether the region is the target. Then because the target size in the image may not be the same size as the sample image you used to train the classifier, you need to make the scanned sub-window larger or smaller (or smaller), then slide in the image and match again.

5. Learning and improving classifiers

Now if the sample number is more, feature selection and classifier algorithm are better, the detection accuracy of the classifier is very high. But it will also be the wrong time to check. So the more advanced point is to join the study or self-adaptation, that is, you put this picture classification error, I will take this picture out, labeled its correct category, and then put in the sample library to train the classifier, let the classifier update, Wake up, next time don't give me a mistake. How did you know he was mistaken? I understand that most of this is determined by a priori knowledge (such as the existence of a structure or whatever constraint on the target itself) or by a combination of tracking (where the goal is generally not moving too fast).

In fact, the above pattern classification process is suitable for many fields, such as image ah, speech recognition and so on. So what is the key point of this whole process?

(1) Feature selection:

The feeling target is more prevalent:Haar characteristic ,lbp characteristic ,hog characteristic and shif characteristic , etc. they have their own merits, depending on the target you want to test, such as:
Fist: Texture features obvious: Haar, LBP (at present, it is combined with hog);
Palm: Contour features obvious: Hog characteristics (pedestrian detection generally use this);
(In the blog, I will refer to the various cattle blog and data to organize Haar features, LBP features, hog features and shif characteristics, such as the content, see blog update)
(2) Classifier algorithm:
The sense target is more prevalent:SVM support vector machine ,AdaBoost algorithm , and so on, in which the detection of pedestrian is generally hog features +SVM,OPENCV detection of human face is generally haar+adaboost, OpenCV in the detection of fists is generally lbp+ AdaBoost;




Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.