Three magic weapons of image feature extraction: Hog features, LBP characteristics, Haar characteristics

Last Update:2017-01-18 Source: Internet

Author: User

Tags svm rsat

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

From:http://dataunion.org/20584.html

(i) Hog characteristics

1. Hog Features:

The directional gradient histogram (histogram of oriented Gradient, HOG) is a feature descriptor used for object detection in computer vision and image processing. It is characterized by calculating and statistic the gradient direction histogram of local region of image. Hog feature combined with SVM classifier has been widely used in image recognition, especially in pedestrian detection. It is to be reminded that the method of pedestrian detection is HOG+SVM French researchers Dalal in 2005 of CVPR, and now although a lot of pedestrian detection algorithms continue to be proposed, but the basic is based on HOG+SVM thinking.

(1) Main ideas:

In an image, the representation and shape of the local target (appearance and shapes) can be well described by the gradient or the direction density distribution of the edges. (Essence: The statistical information of gradients, where gradients exist mainly at the edges).

(2) The specific implementation method is:

First, the image is divided into small connected areas, which we call cell units. The gradient or edge direction histogram of each pixel in the cell is then collected. Finally, these histograms can be combined to form a feature descriptor.

(3) Improve performance:

The local histogram in the larger range of the image (we call it interval or block) for contrast normalization (contrast-normalized), the method is: first calculate the density of each histogram in this interval (block), Then the cells of each cell in the interval are normalized according to the density. With this normalization, you can get better results for light changes and shadows.

(4) Advantages:

Hog has many advantages over other feature description methods. First, since the hog is operated on the local cell of the image, it maintains a good invariance of both the geometric and optical deformations of the image, both of which appear in the larger space domain. Secondly, in the rough airspace sampling, fine direction sampling and strong local optical normalization conditions, as long as the pedestrian generally can maintain upright posture, can allow pedestrians have some subtle limb movements, these subtle actions can be ignored without affecting the detection effect. Therefore, the Hog feature is particularly suitable for human detection in images.

2, the Hog feature extraction algorithm implementation process:

Approximate process:

The Hog feature extraction method is to put an image (the Target or scan window you want to detect):

1) Grayscale (image as a three-dimensional image of X, Y, Z (grayscale));

2) using gamma correction method to standardize the color space of the input image (normalized), the aim is to adjust the contrast of the image, to reduce the shadow and illumination changes caused by the image, and to suppress noise interference;

3) Calculate the gradient (including size and orientation) of each pixel of the image, mainly to capture contour information, and further weaken the illumination interference.

4) Divide the image into small cells (e.g. 6*6 pixels/cell);

5) The descriptor of each cell can be formed by counting the gradient histogram of each cell (the number of different gradients).

6) Each cell is composed of a block (for example, 3*3 cell/block), the characteristics of all cells within a block descriptor concatenated together to get the hog characteristics of the block descriptor.

7) Descriptor The Hog feature of all blocks in image images in series to get the Hog feature descriptor of that image (the target you want to detect). This is the final feature vector that can be used for classification.

The detailed procedures for each step are as follows:

(1) Standardize gamma space and color space

In order to reduce the influence of illumination factors, the whole image must be normalized (normalized). In the texture intensity of the image, the proportion of local surface exposure contribution is larger, so the compression processing can effectively reduce the shadow and illumination changes in the image. Because the color information does not function very much, it is usually converted to grayscale.

Gamma Compression formula:

For example, can take gamma=1/2;

(2) Calculate image gradient

Calculates the gradient of the horizontal and vertical direction of the image, and calculates the gradient direction value of each pixel position accordingly; The derivative operation can not only capture contour, silhouette and some texture information, but also weaken the influence of illumination.

The gradient of the pixel point (x, y) in the image is:

The most common method is: first use the [ -1,0,1] gradient operator to the original image convolution operation, the x direction (horizontal direction, to the right is the positive direction) of the gradient component Gradscalx, and then the [1,0,-1]t gradient operator to the original image convolution operation, to get the y direction (vertical direction, The gradient component gradscaly in the positive direction upward. Then use the above formula to calculate the gradient size and direction of the pixel point.

(3) Build gradient histogram for each cell unit

The goal of the third step is to provide an encoding for the local image area while maintaining a weak sensitivity to the posture and appearance of the body object in the image.

We divide the image into several cell cells, such as 6*6 pixels per cell. Suppose we use a histogram of 9 bins to count the gradient information for this 6*6 pixel. That is, the cell's gradient direction 360 degrees into 9 direction blocks, for example: if the gradient direction of this pixel is 20-40 degrees, the histogram of the 2nd bin is added one, so that each pixel in the cell with a gradient direction in the Histogram weighted projection (mapping to a fixed angle range), You can get the histogram of this cell's gradient direction, which is the 9-dimensional eigenvector of the cell (because there are 9 bins).

The pixel gradient direction is used, so what is the gradient size? The gradient size is the weighted value of the projection. For example: the gradient direction of this pixel is 20-40 degrees, then its gradient size is 2 (assuming AH), then the histogram of the 2nd bin count is not added one, but add two (assuming AH).

Cell cells can be rectangular (rectangular) or star-shaped (radial).

(4) Combining cell units into large blocks (blocks), normalized gradient histogram in a block

Due to the change of local illumination and the change of foreground-background contrast, the variation range of gradient intensity is very large. This requires a normalization of the gradient intensity. Normalization can further compress light, shadows, and edges.

The author's approach is to combine each cell unit into a large, spatially connected interval (blocks). Thus, the hog feature of the block is obtained by concatenating the eigenvectors of all the cells in a block. These intervals overlap, which means that the characteristics of each cell appear multiple times in the final eigenvectors with different results. We will call the Block descriptor (vector) after normalization as the hog descriptor.

The interval has two main geometrical shapes-rectangular interval (r-hog) and annular interval (c-hog). The R-hog interval is largely a square lattice, which can be characterized by three parameters: the number of cell units in each interval, the number of pixels in each cell, and the number of histogram channels per cell.

For example: the best parameter setting for pedestrian detection is: 3x3 cell/interval, 6x6 pixels/cell, 9 histogram channels. The characteristic number of a piece is: 3*3*9;

(5) Collection of hog features

The final step is to collect all the overlapping blocks in the detection window and combine them into the final eigenvectors for hog.

(6) What is the Hog feature dimension of an image?

By the way: Dalal proposed hog feature extraction process: The sample image is divided into a number of pixels of the cell (cell), the gradient direction is divided into 9 intervals (bin), in each cell in the direction of all the gradient direction of all the pixels in the various directions of the histogram statistics, A 9-D feature Vector is obtained, each adjacent 4 units constitute a block (block), the feature vectors within a block are combined to obtain a 36-dimensional eigenvector, a block is used to scan the sample image, and the scanning step is a unit. Finally, the characteristics of the human body are obtained by concatenating the features of all blocks together. For example, for an image of 64*128, each 16*16 pixel consists of a cell, each 2*2 cell consists of a block, because each cell has 9 characteristics, so each block has a 4*9=36 feature, with 8 pixels in stride, then the horizontal direction will have 7 scan windows, There will be 15 scan windows in the vertical direction. That is to say, 64*128 's picture, altogether has 36*7*15=3780 characteristic.

(ii) LBP characteristics

LBP (local Binary pattern, partial two-value mode) is an operator used to describe the local texture feature of an image, and it has the notable advantages of rotational invariance and grayscale invariance. It is first by T. Ojala, M.pietik?inen, and D. Harwood was introduced in 1994 for Texture feature extraction. Moreover, the extracted feature is the local texture feature of the image;

1. Description of LBP Features

The original LBP operator is defined in the window of the 3*3, with the window center pixel as the threshold value, the adjacent 8 pixels of the gray value of the comparison with its, if the surrounding pixel value is greater than the center pixel value, then the location of the pixel is marked 1, otherwise 0. In this way, 8 points in the 3*3 neighborhood can be compared to produce 8-bit binary numbers (typically converted to a decimal number, or LBP code, a total of 256), which is the LBP value of the center pixel of the window, and this value is used to reflect the texture information of the region. As shown in the following:

An improved version of LBP:

After the original LBP was proposed, the researchers continued to propose various improvements and optimizations.

(1) Circular LBP operator:

The biggest drawback of the basic LBP operator is that it covers only a small area within a fixed radius, which obviously does not meet the needs of different sizes and frequency textures. In order to adapt to the texture characteristics of different scales, and to achieve the requirements of grayscale and rotational invariance, Ojala has improved the LBP operator, extended the 3x3 neighborhood to any neighborhood, and replaced the Square neighborhood with a circular neighborhood, and the improved LBP operator allows any number of pixels in a circular neighborhood with a radius of R. The LBP operator with P sampling points in a circular region with radius r is obtained.

(2) LBP rotation invariant mode

It can be seen from the definition of LBP that the LBP operator is invariant in grayscale, but not rotated. The rotation of the image will get different LBP values.

Maenpaa and other people have extended the LBP operator, and a LBP operator with rotational invariance is proposed, that is, a series of initial defined LBP values are obtained by rotating the circular neighborhood, taking its minimum value as the LBP value of the neighborhood.

Figure 2.5 shows the process of obtaining the rotation of the LBP, the figure below the operator of the number of the corresponding LBP value of the operator, shown in the figure of the 8 LBP pattern, after the rotation of the constant processing, the final result has a rotational invariance of LBP value of 15. In other words, the 8 LBP patterns in the graph correspond to a rotation-invariant LBP pattern of 00001111.

(3) LBP equivalent mode

A LBP operator can produce different binary modes, and the LBP operator with P sampling points in a circular region with radius r will produce P2 patterns. Obviously, with the increase in the number of sampling points in the neighborhood set, the type of binary mode is sharply increased. For example, 20 sample points within a 5x5 neighborhood, with 220 = 1,048,576 binary modes. So many of the two value patterns are not good for texture extraction or texture recognition, classification, and access to information. At the same time, too many pattern types are detrimental to the expression of textures. For example, when the LBP operator is used for texture classification or face recognition, the statistical histogram of LBP is used to express the information of the image, and more pattern types will make the data too large and the histogram is too sparse. Therefore, it is necessary to reduce the dimensionality of the original LBP pattern, which can best represent the information of the image when the data volume is reduced.

In order to solve the problem of too much binary mode and improve the statistic, Ojala proposes to use an "equivalence pattern" (Uniform pattern) to reduce the dimensionality of the pattern types of LBP operators. Ojala that the majority of LBP patterns in real images contain up to two jumps from 1 to 0 or 0 to 1. As a result, Ojala defines "equivalence mode" as the binary equivalent of a LBP that is referred to as an equivalence pattern class when the corresponding cyclic binary number from 0 to 1 or from 1 to 0 is changed by a maximum of two times. such as 00000000 (0 jumps), 00000111 (only one time from 0 to 1 of the jump), 10001111 (first from 1 to 0, then 0 to 1, a total of two jumps) are equivalent pattern classes. Patterns other than equivalence mode classes are classified as a mixed-mode class, such as 10010111 (a total of four jumps) (this is my personal understanding, not knowing right).

With this improvement, the type of binary mode is greatly reduced without any loss of information. The number of patterns is reduced from the original 2P species to P (P-1) + 2, where p represents the sample count in the neighborhood set. For 8 sample points in a 3x3 neighborhood, the binary mode is reduced from 256 to 58, which makes the eigenvectors less dimensional and reduces the impact of high-frequency noise.

2. The principle of LBP features for testing

It is obvious that the extracted LBP operator can get a LBP "code" at each pixel point, then, after extracting its original LBP operator for an image (the grayscale value of each pixel is recorded), the original LBP feature is still "a picture" (the LBP value of each pixel is recorded).

In the application of LBP, such as texture classification, face analysis and so on, the LBP map is not used as eigenvector for classification recognition, but the statistical histogram of LBP feature spectrum is used as feature vector for classification recognition.

Because, from the above analysis, we can see that this "feature" is closely related to the position information. Directly to two pictures to extract this "feature", and the discriminant analysis, will be because "position is not aligned" and produced a great error. Later, the researchers found that a picture could be divided into sub-regions, extracting LBP features for each pixel in each sub-region, and then establishing a statistical histogram of LBP features within each sub-region. As a result, each sub-region can be described by a statistical histogram, and the whole picture is composed of several statistical histograms;

For example, a 100*100 pixel-sized picture, divided into 10*10=100 sub-regions (which can be divided in several ways), each sub-region is 10*10 pixels, each pixel within each sub-region extracts its LBP feature, and then a statistical histogram is established; This picture has a 10*10 sub-area, there is a 10*10 statistical histogram, using this 10*10 statistical histogram, you can describe this picture. Then, we can judge the similarity between the two images by using various similarity measure functions.

3. The process of extracting LBP eigenvector

(1) First, the detection window is divided into 16x16 small area (cell);

(2) for one pixel in each cell, compare the gray values of the adjacent 8 pixels with their values, if the surrounding pixel value is greater than the center pixel value, the pixel position is marked as 1, otherwise 0. Thus, the 8 points in the 3*3 neighborhood can produce a 8-bit binary number, that is, the LBP value of the center pixel of the window is obtained;

(3) The histogram of each cell is then computed, that is, the frequency at which each number (assuming the value of the decimal number LBP) occurs, and the histogram is normalized.

(4) Finally, the resulting statistical histogram of each cell is connected to become a eigenvector, that is, the LBP texture feature vector of the whole picture;

You can then use SVM or other machine learning algorithms to classify it.

(iii) Haar characteristics

1. Haar-like characteristics

The Haar-like feature was first applied to face expression by Papageorgiou and so on, and Viola and Jones were based on the use of 3 types of 4 forms of features.

Haar features are divided into three categories: Edge features, linear features, central features, and diagonal features, which are combined into feature templates. The feature template has both white and black rectangles, and defines the template's characteristic values as white rectangle pixels and minus black rectangle pixels and. The Haar characteristic value reflects the gray-level change of the image. For example: Some features of the face can be described by a simple rectangular feature, such as: The eye is darker than the cheek color, the nose bridge on both sides of the bridge than the color to deep, mouth than the surrounding color to be deep. But the rectangle feature is sensitive to some simple graphic structures, such as edges and segments, so it can only describe the structure of a particular trend (horizontal, vertical, diagonal).

For the characteristics of a, B and D in the figure, the characteristic numerical formula is: V=sum white-sum Black, and for C, the formula is as follows: V=sum white -2*sum black; the black area pixels and multiplied by 2 are to match the number of pixels in the two rectangular regions.

By changing the size and position of the feature template, a large number of features can be quoted in the Image sub-window. Feature template is called "feature prototype"; The feature prototype expands in the image Subwindow (translation scaling) The resulting features are called "rectangular features", and the values of the rectangular features are called "eigenvalues".

Rectangular features can be anywhere in the image, and the size can be arbitrarily changed, so the rectangular eigenvalue is a rectangular template category, rectangular position and the size of the rectangle three factors of the function. Therefore, the category, size and position changes, so that very small detection window contains a lot of rectangular features, such as: In the 24*24 pixel size of the detection window within the number of rectangular features can reach 160,000. So there are two problems to solve: (1) How to quickly calculate so many features? -Integral graph prowess; (2) which rectangular features are most effective for classifier classification? -such as through the AdaBoost algorithm to train (this piece is not discussed here, specifically see http://blog.csdn.net/zouxy09/article/details/7922923)

2. Calculation of Haar-like features-integral graph

The integral graph is a fast algorithm that can find all the pixels in the image and the efficiency of the image eigenvalue calculation by traversing only one image.

The main idea of the integral graph is that the pixels of the rectangular region formed from the starting point of the image to each point are stored in memory as an element of an array, and the elements of the array can be indexed directly when the pixels of an area are computed, without recalculating the pixels of the region and speeding up the calculation (this has a corresponding salutation, Called dynamic programming Algorithms). The integration graph can be used to calculate different characteristics at various scales using the same time (constant time), so the detection speed is greatly improved.

Let's see how it's done.

Integral graph is a kind of matrix representation method which can describe global information. The integral graph is constructed by the value of position (I,J) II (I,J) is the same as all pixels in the upper-left direction of the original image (I,J):

Integration Graph Construction Algorithm:

1) use S (i,j) to denote the summation of the line direction and initialize S (i,-1) = 0;

2) Use II (I,J) to represent an integral image, Initialize II ( -1,i) = 0;

3) Progressive scan of the image, recursive calculation of each pixel (i,j) line direction of the summation and S (i,j) and Integral Image II (I,J) values

S (i,j) =s (i,j-1) +f (I,J)

II (I,J) =ii (i-1,j) +s (I,J)

4) Once the image is scanned, the integral Image II is constructed when the pixel in the lower right corner of the image is reached.

After the integral graph is constructed, the pixel summation of any matrix region in the image can be obtained by simple operation.

The four vertices of D are α, β, γ, δ, then the pixels of D and can be expressed as

Dsum = II (α) +ii (β)-(II (γ) +ii (δ));

The Haar-like eigenvalue is nothing more than two matrix pixels and the difference, can also be completed in constant time. Therefore, the eigenvalue calculation of a rectangular feature is only related to the integral graph of the vertex of this feature rectangle, so no matter how the scale transformation of this feature rectangle is, the time spent on the calculation of the eigenvalues is constant. This allows the eigenvalues of all child windows to be evaluated as long as the image is traversed once.

3. Haar-like Rectangle Feature Expansion

Lienhart R. The Haar-like Rectangle feature Library is further expanded, and the rotation 45 is added. The rectangle feature of the corner. The expanded features are broadly divided into 4 types: Edge feature, line feature ring, center surround feature, and diagonal feature:

During the calculation of the eigenvalues, the weights of the black areas are negative, and the weights of the white areas are positive values. And the weights are inversely proportional to the rectangular area (so that the number of pixels in the two rectangular regions is consistent);

Vertical matrix Eigenvalue calculation:

For the vertical matrix, the same as above 2.

Calculation of the rectangular features of 45° rotation angle:

For a rectangle with a rotation angle of 45°, we define the pixels of rsat (x, y) to the upper-left corner of the 45° area and the lower-left corner of the 45° area.

The formula can be expressed as:

In order to save time and reduce the repetition calculation, the following recursive formula can be calculated as follows:

The characteristic value of the computed matrix feature is the difference between the RSAT (x, y) of the cross-row rectangle. Refer to:

Note: Reproduced articles are from the public network, only for learning to use, will not be used for any commercial purposes, if the infringement of the original author's interests, please contact us to delete or licensing matters, contact e-mail: [email protected]. Reproduced several league website article please indicate the original article author, otherwise, any copyright dispute arising from the league has nothing to do.

Three magic weapons of image feature extraction: Hog features, LBP characteristics, Haar characteristics

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More