Gradient direction Histogram hog (histograms of oriented gradients)

Source: Internet
Author: User
Tags truncated

HOG (histograms of oriented gradients) gradient direction histogram

The directional gradient histogram (histogram of oriented Gradient, HOG) is a feature descriptor used for object detection in computer vision and image processing. This method uses the gradient direction characteristics of the image itself, similar to the edge direction histogram method, the SIFT descriptor, and the context shape method, but is characterized in that it is computed on a grid-dense, uniformly sized lattice unit, and uses overlapping local contrast normalization in order to improve accuracy.

The author of this article, Navneet Dalal and Bill Triggs, is the French National Institute of Computer Technology and control French Nation Institute for study in computer sciences and Control (INRIA) researcher. They put forward the Hog method for the first time in this article . This article was published on the CVPR of 2005 years . They mainly apply this method to pedestrian detection in static images, but later they also apply it to pedestrian detection in movies and videos, as well as vehicles and common animals in static images.

The most important idea of the hog descriptor is that in an image, the representation and shape of the local object (appearance and shapes) can be well described by the gradient or the direction density distribution of the edges. The specific implementation method is: first divides the image into the small connected area, we call it cell cell. Then the gradient or the direction histogram of the edge of each pixel in the cell is collected. Finally, these histograms can be combined to form a feature descriptor. To improve performance, we can also use these local histograms in the larger range of the image (we call it interval or block) for contrast normalization (contrast-normalized) by first calculating the density of each histogram in this interval (block), Then the cells of each cell in the interval are normalized according to the density. With this normalization, you can get better results for light changes and shadows.

Compared with other feature description methods, hog descriptor has many advantages. First, since the Hog method operates on the local cell unit of the image, it maintains a good invariance of both the image geometry (geometric) and the optical (photometric) deformation, which only appears in the larger space domain.

The image processing flow is:

For the specific area block diagram, the cell and block diagrams are:

The above two diagram fully illustrates the relationship between cell and block. One, assuming that the image is 40*40, assume that each block has 2*2 cells, each cell is 8*8, so there is a 4*4 block exists.

In addition, each cell's gradient direction is divided into Z-directional blocks, weighted in Z-direction using the gradient direction and amplitude in the cell, and finally each cell produces a z-dimensional eigenvector. Dalal for human detection hog selected Z=9, will be 360 degrees into 9 direction blocks, and then for the direction of the gradient projection, such as:

The implementation of the algorithm:

(1) Color and gamma normalization (color and gamma normalization)

In the gray space, the RGB color space and the lab color space, the image is normalized by color and gamma, but the experimental results show that the normalized preprocessing has no effect on the final result, possibly because there are normalization processes in the next steps, which can replace the normalization of this preprocessing. Therefore, in practical applications, this step can be omitted.

(2) Calculation of gradients (Gradient computation)

The most common approach is to simply use a one-dimensional discrete differential template (1-d centered point discrete derivative mask) to process the image in one direction or both horizontally and vertically in two directions, or more precisely, This method requires filtering out the color or dramatic data in the image using the following filter cores (color or intensity)

The authors also tried other more complex templates, such as the 3x3 Sobel template, or the diagonal template (diagonal masks), but in this pedestrian-detection experiment, these complex templates behaved poorly, so the authors concluded that the simpler the template, the better the effect. The authors also tried to add a Gaussian smoothing filter before using the differential template, but the addition of this Gaussian smoothing filter made the detection less effective because many useful image information came from the sharp edges, and Gaussian filtering was used to filter the edges before the gradient was computed.

(3) Histogram of construction direction (creating the orientation histograms)

The third step is to construct a gradient-oriented histogram for each cell unit of the image. Each pixel in the cell cell is voted on by a direction-based histogram channel (orientation-based histogram channel). Voting is a weighted vote (weighted voting), that is, each vote is a weight, which is calculated based on the gradient amplitude of the pixel point. This weight can be represented by the amplitude itself or its function, and the actual test shows that using the amplitude to represent the weight can achieve the best result, of course, you can also choose a function of the amplitude, such as the square root of the amplitude (square root), the square of the value squared (square of the gradient magnitude), the truncated form of the amplitude (clipped version of the magnitude), etc. Cell units (cells) can be rectangular (rectangular), or star-shaped (radial). The histogram channel is evenly distributed across the 0°-180° (no direction) or 0°-360° (forward) range. The authors found that using the non-direction gradient and 9 histogram channels can achieve the best results in pedestrian test.

(4) Grouping cell units (cells) into large intervals (grouping the cells together into larger blocks)

     changes due to local illumination (variations of illumination) and foreground-background contrast (foreground-background contrast), This makes the gradient intensity (gradient strengths) vary greatly. This requires normalization of the gradient intensity, which the authors adopt by combining each cell unit into a large, spatially connected interval (blocks).

R-hog looks very similar to the SIFT descriptor, but their difference is that the R-hog is calculated at a single-scale, dense grid, with no direction ordering (is computed in dense grids at some Without orientation alignment), while the SIFT descriptor is calculated on multi-scale, sparse image key points, and in the case of direction ordering (is computed at sparse scale-invariant key Image points and is rotated to align orientation). To add, R-hog is a combination of various intervals used to encode airspace information (is used in conjunction to encode spatial form information), while sift descriptors are used alone (are used singly).

The C-hog interval (blocks) has two different forms, the difference being that a central cell is intact, and a central cell is divided.

The authors found that both of these forms of c-hog can achieve the same effect. The C-hog interval (blocks) can be characterized by four parameters: the number of angle boxes (numbers of angular bins), the number of radius boxes (numbers of radial bins), the radius of the central box (radius of the center BIN), the radius of the stretch factor (expansion factor for the radius). Through experiments, for pedestrian detection, the best parameters are set to: 4 Angle box, 2 radius box, center box radius is 4 pixels, stretching factor is 2. As mentioned earlier, for R-hog, the middle plus a Gaussian airspace window is very necessary, but for C-hog, this does not seem necessary. C-hog looks much like a method based on shape contexts, but the difference is that the cell cells contained in the C-hog interval have multiple directional channels (orientation channels), The shape-context-based approach only uses a single edge number (Edge presence count).

(5) Interval normalization (Block normalization)

Interval Normalization (Block normalization)

The authors used four different methods to normalized the interval and compared the results. Introducing v represents a vector that has not yet been normalized, and it contains all the histogram information for a given interval (block). VK | | Represents the K-order norm of V, where K takes 1, 2. Use E to denote a very small constant. At this point, the normalization factor can be expressed as follows:

L2-norm:

L1-norm:

L1-SQRT:

There is also a fourth normalization method: L2-hys, which can be truncated (clipping) by the advanced line L2-norm, and then re-normalized. The authors found that the results obtained by using the L2-hys L2-norm and L1-sqrt methods are the same, and the l1-norm shows a little bit of unreliability. But for data that is not normalized, these four methods show significant improvements.

(RPM) Gradient direction histogram hog (histograms of oriented gradients)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.