Image Object Detection and Recognition
1 Introduction
Previously, we talked about the Haar features in face recognition. This article focuses on the facial recognition feature in the face detection, which is applicable to face detection. In fact, it can also detect other objects. You only need to modify the training dataset. Therefore, the subject of this article is object detection and recognition, such as checking whether a car has a license plate number.
In the face recognition algorithm based on the Haar features implemented by opencv, the facial recognition feature is also supported.
Link to the Haar feature blog: http://blog.csdn.net/stdcoutzyx/article/details/34842233.
2. History
In October 1996, Ojala was developed with the feature of the guid ("), that is, reference 1. At that time, it seemed that no waves were triggered. By the time of 2002, the old man summarized the characteristics of the sub-string, produced references 2. the reference number of this document is more than 4600 so far, which indicates that its weight is heavy.
By 2004, Ahonen used the image processing feature for facial recognition for the first time, that is, Reference 3. This feature is easy to calculate. Although its overall effect is not as good as that of the Haar feature, it is faster than that of the Haar feature, so it is also widely used.
In 2007, a group of experts from the Chinese Emy of Sciences introduced the integral graph method of Haar feature computing to produce the multi-scale HSV feature, that is, reference 4. This greatly improves the detection rate of facial recognition.
The main content of this article is the application of the feature, multi-scale mechanism, and the application of the feature in face recognition.
3. lbfeatures
Let's get down to the truth. What is the feature of the Gbit/s? The abbreviation of Local Binary Pattern (Local Binary mode. It is defined as follows:
Compare the gray-scale values of adjacent eight pixels with the neighboring Central pixel as the threshold value. If the surrounding pixel value is greater than the central pixel value, the position of the pixel is marked as 1, otherwise, it is 0. In this way, the 8 points in the 3x3 neighborhood can generate 8-bit binary numbers (usually converted to a decimal number, that is, a total of 256 kinds of codes ), then obtain the HSV value of the central pixel in the neighborhood, and use this value to reflect the texture information of the region.
Reflects the calculation process of the specific feature values of a certain pixel. It is worth noting that the feature value is a binary number in a clockwise direction.
The formula is as follows:
It represents the central element in the 3x3 neighborhood, its pixel value is IC, and IP represents the value of other pixels in the neighborhood. S (X) is a symbolic function and is defined as follows:
4. Improvement of the circular shape of the feature
In the original paper [1], after defining the basic HSV, an improved method is also defined, that is, the round-shaped HSV.
The biggest drawback of the basic lboperator is that it only covers a small area within a fixed radius, which obviously cannot meet the needs of texture of different sizes and frequencies. In order to adapt to texture features of different scales and meet the grayscale and rotation immutability requirements, Ojala and others have improved the KNN operator and extended the 3 × 3 neighbor to any neighbor, the circular neighbor is used to replace the square neighborhood. The improved KNN operator allows any number of pixels in the circular neighborhood with a radius of R. In this way, we obtain the HSV operator with P sampling points in the circular area with a radius of R.
For example, a 5x5 neighbor is set:
There are eight black sampling points. The values of each sampling point can be calculated as follows:
It is the center of the neighboring area, which is a sample point. The preceding formula can be used to calculate the coordinates of any sampling point, but the obtained coordinates are not necessarily integers. Therefore, the pixel values of the sampling point can be obtained through bilinear interpolation:
This section lists the examples of the two kinds of operators with different radius and different sampling points.
5. re-conversion of the feature
After calculation by the guid operator, the image corresponds to each pixel and each pixel has a guid value. If there are eight sample points during the calculation, the range of the feature value is 0 ~ 255. It can also be expressed as an image. It is called the guid-based processing (LDA) graph. As shown in:
In, the above line is the source image, and the following line is the HSV graph. We can see that one advantage of the feature is that the feature is robust to illumination.
However, in practical application, the feature is not implemented by using the HSV graph. So what should I use?
The feature value ranges from 0 ~ 255. Perform statistics on each feature value, for example, to obtain how many of the two values of the feature value 1, and how many of the two values of the feature value 245. In this way, a histogram is formed. The histogram has 256 bins, that is, 256 components. You can also regard the Histogram as a vector with a length of 256.
If this vector is used directly, a single image will form a vector of at most 256 length for the eight-Sample-point-based lbps operator, so that all the location information is lost, this will cause a lot of Precision Problems. In practice, another technique is to divide the image into several regions, calculate the histogram vectors for each region, and then integrate these vectors to form a large vector. A face image is divided into seven x 7 sub-areas.
6. Use of HSV features
This topic describes the use of two features. One is image similarity calculation, and the other is face detection.
6.1 image similarity calculation
Each image can be represented by a single string feature vector, and the similarity of the image can be calculated using the vector similarity.
There are many similarity calculation methods for vectors, such as cosine and distance. In document 3, three similarity calculation methods based on Histogram vectors are provided, as shown in.
The formula is only for a histogram. in use, the image is divided into multiple regions to calculate the histogram. Therefore, different regions can be weighted in actual use.
6.2 Specific Face Detection
In the formula for calculating image similarity, You need to calculate a value (difference value, minimum value, etc.) for each component, and then accumulate these values. We can also consider that, instead of accumulating the values calculated by each component, we can form a new vector called differential vector. In this way, we can detect specific faces.
The specific training method is as follows:
A) First, prepare the training set. The positive example is the difference vector of the two images of the same person's face. The negative example is the difference vector of the two images of different faces.
B) then, train the training set using classification methods such as Adaboost, SVM, And Naive Bayes to obtain the classification model.
In the test, assume that there is image a, and now you need to determine whether image B is the same person as the face in Image A. Then, calculate the difference vectors of the two images, then, use the trained classification model to classify it. If the classification is positive, it is the same person. Of course, the premise is that image a and Image B are both face images.
7. Dimensionality Reduction
Different binary modes can be generated for a single HSV operator. For a round area with a radius of R containing P sampling points, a 2 ^ P mode will be generated. Obviously, with the increase of the number of sampling points in the neighborhood set, the binary pattern type increases sharply. For example, there are 20 sampling points in the 5*5 neighborhood, and there are 2 ^ 20 = 1,048,576 binary modes. Such a large number of binary modes are not good for texture extraction, texture recognition, classification, and information access. At the same time, too many pattern types are unfavorable for texture expression. For example, when we use the lboperator for texture classification or face recognition, we often use the statistical histogram of the lbmode to express the image information. A large number of pattern types will make the data volume too large, and the histogram is too sparse. Therefore, we need to reduce the dimensionality of the original HSV mode so that the image information is best represented when the data volume is reduced.
In order to solve the problem of too many binary modes and improve the statistical performance, Ojala proposes to use an "Uniform Pattern" to reduce the dimension of the pattern type of the HSV operator. Ojala and others believe that the vast majority of image models in the image can only contain two hops from 1 to 0 or from 0 to 1 at most. Therefore, Ojala defines the "equivalent mode" as: when the number of cyclic binary values corresponding to a piece of BPS ranges from 0 to 1 or from 1 to 0, there can be a maximum of two hops, the binary value corresponding to this GUID is called an Equivalent schema class. For example, 00000000 (0 hops), 00000111 (only one jump from 0 to 1), 10001111 (first jump from 1 to 0, then jump from 0 to 1, two hops in total) are equivalent mode classes. All modes except the equivalent mode class are classified into another type, called the hybrid mode class, for example, 10010111 (four hops in total ). For example, several equivalent modes are provided.
With this improvement, the binary mode is greatly reduced without any information loss. The number of modes is reduced from the original 2 P to P (P-1) + 2, where p indicates the number of sampling points in the neighborhood set. For 8 sampling points in a 3 × 3 neighborhood, the binary mode is reduced from the original 256 to 58, which reduces the dimension of the feature vector and reduces the impact of high-frequency noise.
As shown in the preceding figure, the theory of dimensionality reduction in equivalent mode is that the number of hop modes that contain at most two hops accounts for the majority of all models. Experiments also show that, in general, the number of hop modes that contain up to two hops accounts for 90% of the total number of models.
8 multi-scale
This section describes the results of reference 4.
Basic BPS is the difference between a single pixel and its adjacent pixels. It captures microscopic features. However, macro features cannot be captured. Document 4 has been improved in this regard. It allows you to increase the ratio of the local code processing operator, and calculate the difference between the region and the region. As shown in:
This is the back-to-second (3 times) of the Back-to-origin (PCA) operator. At this time, the central element is changed to the pixel sum of 9 pixels in the 0 region, the feature value of this region becomes the calculation between pixels in the 0 region and other regions. It should be noted that at this time, the HSV feature is targeted at a region rather than a pixel. When calculating the difference between pixels, you can use the integral graph method mentioned in the Haar feature for accelerated computing.
In the multi-scale mode, the theory basis of equivalent mode dimensionality reduction no longer exists. So how does it reduce the dimension?
The method mentioned in this paper is to directly use the statistical method to calculate the models of the 50 operators of different scales, and select the mode with a higher proportion. Instead of using jump rules.
9 Summary
Although the calculation methods of the two features are quite different, they all share a common goal, that is, to represent the image information, in this way, the information stored in images can be fully utilized by algorithms. The relationship between pixels, local content management (LDA) features, and Haar features in images is similar to the relationship between words and words in text analysis. In text analysis, if you want to obtain semantic understanding, you must process sentences, phrases, words, and so on. Analyzing words alone is far from enough. In image processing, it is not enough to analyze pixels alone to obtain rich information contained in images.
The AdaBoost classifier is used in both this article and the blog post on Haar features, but its purpose is completely different. In Haar features, AdaBoost is used to classify images with faces and without faces, this article classifies the faces of the same person. From this we can feel that many machine learning problems can be categorized in essence. Of course, the other category is regression.
I personally think that this blog post is comprehensive enough for the discussion of the Gbit/s. If you want to study in depth, please read the references. You are welcome to discuss this.
Reprinted Please note: http://blog.csdn.net/stdcoutzyx/article/details/37317863
References
[1]. ojala, T ., pietik evaluate ainen, M ., harwood, D.: A Comparative Study of texture measures with Classification Based on Feature distributions. pattern Recognition 29 (1996) 51-59.
[2]. ojala T, pietikainen M, maenpaa T. multiresolution gray-scale and rotation invariant texture classification with local binary patterns [J]. pattern Analysis and machine intelligence, IEEE Transactions on, 2002, 24 (7): 971-987.
[3]. Ahonen T, Hadid A, pietik? Inen M. Face Recognition with local binary patterns [m] // computer vision-eccv 2004. Springer Berlin Heidelberg, 2004: 469-481.
[4]. liao S, Zhu X, Lei Z, et al. learning multi-scale block local binary patterns for face recognition [m] // advances in biometrics. springer Berlin Heidelberg, 2007: 828-837.
[5]. http://blog.csdn.net/smartempire/article/details/23249517
Http://blog.csdn.net/zouxy09/article/details/7929531.