Pattern Recognition Development Project-framework and process of computer vision Target Detection

Last Update:2014-08-31 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

My personal experience with machine vision is not long. I have a preliminary understanding of the general framework and process of machine learning in target detection, please kindly advise.

General framework of Target Detection:

The following are the steps for Target Detection:

1. Create a training sample required for the training classifier:

The training sample includes both positive and negative samples. The positive sample refers to the target sample to be checked (such as a face or car ), negative sample refers to any other image (such as the background) that does not contain the target. All the sample images are normalized to the same size (for example, 20x20 ).

2. Feature Extraction:

The amount of data obtained from images or waveforms is quite large. For example, a text image can contain thousands of data records, and an ECG waveform may contain thousands of data records. To effectively implement classification and identification, we need to transform the original data to obtain the features that best reflect the nature of classification. This is the process of Feature Selection and extraction. Generally, we call the space composed of raw data a measurement space, and the space on which classification and identification depends a feature space, the pattern represented in a measurement space with a higher dimension can be changed to the pattern represented in a feature space with a lower dimension.

3. Use a training sample to train the classifier:

What is the classifier first? Baidu encyclopedia explained: "Classification devices or mathematical models used to classify objects into a certain category ." I think it can be understood. For example, the human brain itself is also a classifier (only powerful enough to be imagined), and the recognition of things is also a process of classification. In the process of growing up or learning, people will observe multiple examples of Class A things to get an understanding of the nature and characteristics of Class A things. Then, when they encounter a new object, the human brain will classify the things into Class A or non-Class A based on whether they conform to the nature and characteristics of Class. (Here we just use a simple binary classification problem to illustrate ). The training classifier can be understood as a classifier (brain) through observation (learning) of positive and negative samples ), so that it can detect the target (which can be recognized in the future ).

In mathematics, the classifier is a function y = f (x), X is the feature of a thing, and Y is the class. In general, for example, you input the feature X1 of Michael, the classifier recognizes this as Zhang wey1. If you input the feature X2 of Li Si, it recognizes this as Li siy2. A classifier is a function. What is its mathematical model? Functions y = kx + B? Higher functions? Wait for a while. We need to determine its model first. After determining the model, does the model have many parameters? For example, the preceding Gini Function Y = K and B of kx + B, the mean and variance of Gaussian Functions, and so on. This can be determined by the methods such as minimizing the classification error and minimizing the punishment. In fact, the training classifier seems to be looking for these parameters to achieve the best classification effect. Haha, I don't know if I'm right.

In addition, in order to make the classification detection accuracy better, the training samples are usually tens of thousands, and each sample extracts many features, resulting in a lot of training data, therefore, the training process is generally time-consuming.

4. Use a trained classifier for Target Detection:

The classifier can be used to classify the input image, that is, to detect whether the target you want to detect exists in the image. The general detection process is as follows: a scanning sub-window is used to continuously shift and slide the image to be detected. The features of the sub-window are calculated at each position, then we use the trained classifier to filter the feature and determine whether the region is the target. Then, because the size of the target image may be different from the size of the sample image used when you train the classifier, you need to increase or decrease the subwindow for this scan (or reduce the image ), then slide in the image and match it again.

5. Learn and improve the Classifier

If the number of samples is large and the feature selection and classifier algorithms are both good, the classifier detection accuracy is high. However, there may also be missed checks. So the more advanced part is learning or adaptive. That is to say, if you classify this image incorrectly, I will take it out and mark it as the correct category, put it in the sample library to train the classifier, so that the classifier can be updated and awakened. Don't make a mistake again next time. How do you know he got it wrong? I understand that most of them are determined by a combination of prior knowledge (for example, the target itself has a structure or a constraint) or tracking (the target generally does not move too fast.

In fact, the above pattern classification process is suitable for many fields, such as AH and speech recognition. So what are the key points of this entire process?

(1) feature selection:

Haar features, HSV features, hog features, and shif features are common. They have their own advantages, depending on the target you want to detect. For example:

Fist: The texture features are obvious: Haar and HSV (currently it can be combined with hog );

Palm: obvious contour features: hog features (usually used for pedestrian detection );

(In my blog, I will refer to the blogs and materials from various cool people to sort out Haar features, HSV features, hog features, and shif features. For details, see blog updates)

(2) classifier algorithm:

Sensory targets are prevalent in SVM, AdaBoost algorithms, and so on. Hog features + SVM are detected for pedestrians, and Haar + AdaBoost is used for Face Detection in opencv, in opencv, the fist is usually detected by the combination;

In the field of computer vision, there are still a lot of features, algorithms, and so on. Some cool people are constantly proposing new things (simple philosophy + complex mathematics ), there are also cool people who are constantly improving their previous things, and with the pace of the years, technology is constantly running!

Http://blog.csdn.net/liulina603/article/details/8291143

Pattern Recognition Development Project-framework and process of computer vision Target Detection

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More