Target Detection--hog Features

Source: Internet
Author: User
Tags svm

1. Hog Features:

The directional gradient histogram (histogram of oriented Gradient, HOG) is a feature descriptor used for object detection in computer vision and image processing. It is characterized by calculating and statistic the gradient direction histogram of local region of image. Hog feature combined with SVM classifier has been widely used in image recognition, especially in pedestrian detection. It is to be reminded that the method of pedestrian detection is HOG+SVM French researchers Dalal in 2005 of CVPR, and now although a lot of pedestrian detection algorithms continue to be proposed, but the basic is based on HOG+SVM thinking.

Realize:

First, the image is divided into small connected areas, which we call cell units. The gradient or edge direction histogram of each pixel in the cell is then collected. Finally, these histograms can be combined to form a feature descriptor.

Improve performance:

The local histogram in the larger range of the image (we call it interval or block) for contrast normalization (contrast-normalized), the method is: first calculate the density of each histogram in this interval (block), Then the cells of each cell in the interval are normalized according to the density. With this normalization, you can get better results for light changes and shadows.

Advantages:

Hog has many advantages over other feature description methods. First, since the hog is operated on the local cell of the image, it maintains a good invariance of both the geometric and optical deformations of the image, both of which appear in the larger space domain. Secondly, in the rough airspace sampling, fine direction sampling and strong local optical normalization conditions, as long as the pedestrian generally can maintain upright posture, can allow pedestrians have some subtle limb movements, these subtle actions can be ignored without affecting the detection effect. Therefore, the Hog feature is particularly suitable for human detection in images.

2, the Hog feature extraction algorithm implementation process:

Approximate process:

The Hog feature extraction method is to put an image (the Target or scan window you want to detect):

1) Grayscale (image as a three-dimensional image of X, Y, Z (grayscale));

2) using gamma correction method to standardize the color space of the input image (normalized), the aim is to adjust the contrast of the image, to reduce the shadow and illumination changes caused by the image, and to suppress noise interference;

3) Calculate the gradient (including size and orientation) of each pixel of the image, mainly to capture contour information, and further weaken the illumination interference.

4) Divide the image into small cells (e.g. 6*6 pixels/cell);

5) The descriptor of each cell can be formed by counting the gradient histogram of each cell (the number of different gradients).

6) Each cell is composed of a block (for example, 3*3 cell/block), the characteristics of all cells within a block descriptor concatenated together to get the hog characteristics of the block descriptor.

7) Descriptor The Hog feature of all blocks in image images in series to get the Hog feature descriptor of that image (the target you want to detect). This is the final feature vector that can be used for classification.

The detailed procedures for each step are as follows:

(1) Standardize gamma space and color space

In order to reduce the influence of illumination factors, the whole image must be normalized (normalized). In the texture intensity of the image, the proportion of local surface exposure contribution is larger, so the compression processing can effectively reduce the shadow and illumination changes in the image. Because the color information does not function very much, it is usually converted to grayscale.

Gamma Compression formula:

For example, can take gamma=1/2;

(2) Calculate image gradient

Calculates the gradient of the horizontal and vertical direction of the image, and calculates the gradient direction value of each pixel position accordingly; The derivative operation can not only capture contour, silhouette and some texture information, but also weaken the influence of illumination.

The gradient of the pixel point (x, y) in the image is:

The most common method is: first use the [ -1,0,1] gradient operator to the original image convolution operation, the x direction (horizontal direction, to the right is the positive direction) of the gradient component Gradscalx, and then the [1,0,-1]t gradient operator to the original image convolution operation, to get the y direction (vertical direction, The gradient component gradscaly in the positive direction upward. Then use the above formula to calculate the gradient size and direction of the pixel point.

(3) Build gradient histogram for each cell unit

The goal of the third step is to provide an encoding for the local image area while maintaining a weak sensitivity to the posture and appearance of the body object in the image.

We divide the image into several cell cells, such as 6*6 pixels per cell. Suppose we use a histogram of 9 bins to count the gradient information for this 6*6 pixel. That is, the cell's gradient direction 180 degrees into 9 direction blocks, for example: if the gradient direction of this pixel is 20-40 degrees, the histogram of the 2nd bin is added one, so that each pixel in the cell with a gradient direction in the Histogram weighted projection (mapping to a fixed angle range), You can get the histogram of this cell's gradient direction, which is the 9-dimensional eigenvector of the cell (because there are 9 bins).

The pixel gradient direction is used, so what is the gradient size? The gradient size is the weighted value of the projection. For example: the gradient direction of this pixel is 20-40 degrees, then its gradient size is 2 (assuming AH), then the histogram of the 2nd bin count is not added one, but add two (assuming AH).

Cell cells can be rectangular (rectangular) or star-shaped (radial).

(4) Combining cell units into large blocks (blocks), normalized gradient histogram in a block

Due to the change of local illumination and the change of foreground-background contrast, the variation range of gradient intensity is very large. This requires a normalization of the gradient intensity. Normalization can further compress light, shadows, and edges.

The author's approach is to combine each cell unit into a large, spatially connected interval (blocks). Thus, the hog feature of the block is obtained by concatenating the eigenvectors of all the cells in a block. These intervals overlap, which means that the characteristics of each cell appear multiple times in the final eigenvectors with different results. We will call the Block descriptor (vector) after normalization as the hog descriptor.

The interval has two main geometrical shapes-rectangular interval (r-hog) and annular interval (c-hog). The R-hog interval is largely a square lattice, which can be characterized by three parameters: the number of cell units in each interval, the number of pixels in each cell, and the number of histogram channels per cell.

For example: the best parameter setting for pedestrian detection is: 3x3 cell/interval, 6x6 pixels/cell, 9 histogram channels. The characteristic number of a piece is: 3*3*9;

(5) Collection of hog features

The final step is to collect all the overlapping blocks in the detection window and combine them into the final eigenvectors for hog.

(6) What is the Hog feature dimension of an image?

By the way: Dalal proposed hog feature extraction process: The sample image is divided into a number of pixels of the cell (cell), the gradient direction is divided into 9 intervals (bin), in each cell in the direction of all the gradient direction of all the pixels in the various directions of the histogram statistics, A 9-D feature Vector is obtained, each adjacent 4 units constitute a block (block), the feature vectors within a block are combined to obtain a 36-dimensional eigenvector, a block is used to scan the sample image, and the scanning step is a unit. Finally, the characteristics of the human body are obtained by concatenating the features of all blocks together. For example, for an image of 64*128, each 16*16 pixel consists of a cell, each 2*2 cell consists of a block, because each cell has 9 characteristics, so each block has a 4*9=36 feature, with 8 pixels in stride, then the horizontal direction will have 7 scan windows, There will be 15 scan windows in the vertical direction. That is to say, 64*128 's picture, altogether has 36*7*15=3780 characteristic.

Transferred from: http://blog.csdn.net/zouxy09/article/details/7929348

Correction: 1, the gradient direction histogram is constructed for each cell unit and the degree should be 180 degrees instead of 360 degrees.

Target Detection--hog Features

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.