Histogram of Oriented gradients (hog)

Source: Internet
Author: User

Based on the papers I have read this week, I will talk about my understanding of histogram of Oriented gradients (hog) I have studied this week:

Hog descriptors is a feature descriptor used for target detection in computer vision and image processing. This technique is used to calculate the statistical value of the Direction information of the partial image gradient. This method is similar to Edge Orientation histograms, scale-invariant feature transform descriptors, and shape context (shape contexts, but the difference with them is that the hog descriptor is calculated on a grid-intensive, uniform Cell Unit (dense grid of uniformly spaced cells), and to improve performance, overlapping local contrast normalization is also adopted.

Navneet Dalal and Bill triggs, authors of this article, are researchers at the French National Institute of computer technology and control (French National Institute for Research in computer science and control (INRIA. They proposed the hog method for the first time in this article. This article was published on cvpr in 2005. They mainly applied this method to pedestrian detection in static images, but later they also applied it to pedestrian detection in movies and videos, and the detection of vehicles and common animals in static images.

The most important idea of hog descriptors is that in an image, the appearance and shape of a local target can be well described by the Direction density distribution of the gradient or edge. The concrete implementation method is: first divide the image into small connected areas, we call it cell units. Then, the gradient or edge direction histogram of each pixel in the cell unit is collected. Finally, combine these histograms to form a feature descriptor. To improve performance, we can also normalize the local histograms within a larger range of the image (we call it a range or block) By contrast normalization (contrast-normalized ), the adopted method is: Calculate the density of each histogram in this range (Block), and then normalize each cell unit in the interval according to the density. After this normalization, we can achieve better effects on illumination changes and shadows.

Compared with other feature description methods, the hog descriptor has many advantages. First, because the hog method operates on the partial cell units of the image, it can maintain good immutability for geometric and optical deformation of the image, these two types of deformation only appear in a larger space field. Secondly, through experiments, the authors found that coarse space sampling (coarse spatial sampling), fine orientation sampling, and strong local optical normalization (strong local photometric normalization) in other conditions, as long as a pedestrian can maintain an upright posture in general, it allows the pedestrian to have some minor physical movements. These subtle movements can be ignored without affecting the detection effect. In conclusion, the hog method is particularly suitable for pedestrian detection in images.

This is a pedestrian detection test conducted by the author. (a) indicates the average gradient of all training image sets (average gradient SS their training images); (B) and (c) indicate: the largest and largest positive and negative SVM weights on each block in the image; (d) indicates a test image; (e) the test image after the R-HOG is calculated; (f) and (g) represent the R-HOG images after being weighted by positive and negative SVM weights respectively.

Algorithm Implementation:

Color and Gamma Normalization(Color and Gamma normalization)

The authors normalize the color and Gamma of the images in the gray space, RGB color space, and LAB color space respectively. However, the experimental results show that the normalized preprocessing has no effect on the final results, the possible reason is that there are also normalization processes in the subsequent steps, which can replace the normalization of this preprocessing. Therefore, this step can be omitted in practical applications.

Gradient calculation(Gradient computation)

The most common method is to use a one-dimensional discrete differential template (1-D centered, point discrete derivative mask) image Processing in one direction or both horizontal and vertical directions, more specifically, this method needs to use the following filter core to filter out color or sharply changed data in the image (color or intensity data)

The author also tried other more complex templates, such as the 3 × 3 Sobel template or the diagonal masks template. However, in this pedestrian detection experiment, these complex templates have poor performance, so the author's conclusion is: the simpler the template, the better the effect. The author also tried to add a Gaussian smoothing filter before using the differential template, but the addition of this Gaussian smoothing filter makes the detection effect worse, because: many useful image information comes from the sharp edge, which is filtered out by Gaussian filtering before the gradient is calculated.

Build a histogram of the Direction(Creating the orientation histograms)

The third step is to build a gradient direction histogram for each cell unit of the image. Each pixel in a cell unit votes for a direction-based histogram channel. The voting adopts the weighted voting method, that is, each vote has a weight value, which is calculated based on the gradient amplitude of the pixel. The amplitude itself or its function can be used to represent this weight. The actual test shows that using the amplitude to represent the weight can achieve the best effect. Of course, you can also use the amplitude function, such as the square root of the amplitude, square of the gradient magnsquared, and clipped version of The magnsquared). The cell unit can be a rectangle (rectangular) or a star (radial ). The histogram channel is evenly distributed in the range of 0-1800 (undirected) or 0-3600 (undirected. The authors found that the use of undirected gradients and nine histogram channels can achieve the best results in pedestrian detection experiments.

Combine cell units into large intervals(Grouping the cells together into larger blocks)

Due to the variation of local illumination (variations of illumination) and foreground-background contrast (foreground-background contrast), gradient strengths has a large variation range. In this case, the gradient intensity needs to be normalized. the method adopted by the author is to combine various cell units into a large, space-connected range (blocks ). In this way, the hog descriptor becomes a vector composed of the histogram components of all cell units in each interval. These intervals overlap each other, which means that the output of each cell unit acts on the final descriptor multiple times. The interval has two major geometric shapes-the rectangular interval (R-HOG) and the ring interval (C-HOG ). The R-HOG interval is basically a square lattice, which can be characterized by three parameters: number of cell units in each interval, number of pixels in each cell unit, and number of histogram channels for each cell. Experiments show that the optimal parameter settings for pedestrian detection are 3 × 3 cells/interval, 6 × 6 pixels/cell, and 9 histogram channels. The author also found that before processing the histogram, it is necessary to add a Gaussian spatial window to each interval (Block, this can reduce the weight of pixels around the edge.

R-HOG and sift descriptors look very similar, but their difference is: the R-HOG is calculated in a single scale, in a dense grid, without sorting the direction (are computed in dense grids at some single scale without orientation alignment ); the sift descriptor is calculated based on multi-scale, sparse image key points, and directed sorting (are computed at sparse, scale-invariant key image points and are rotated to align orientation ). In addition, R-HOG is a combination of intervals used to encode airspace information (are used in conjunction to encode spatial form information ), the descriptors of Sift are used independently (are used singly ).

C-HOG intervals (blocks) have two different forms, their difference is that one central cell is complete, and one central cell is split. As shown in the right figure:

The authors found that both forms of C-HOG can achieve the same effect. Blocks can be characterized by four parameters: Number of angular bins, number of radial bins) radius of the center bin and expansion factor for the radius ). Through the experiment, for pedestrian detection, the best parameter is set to: 4 angle boxes, 2 radius boxes, center box radius is 4 pixels, and stretch factor is 2. As mentioned above, for the R-HOG, adding a Gaussian airspace window in the middle is very necessary, but for the C-HOG, this seems unnecessary. The C-HOG looks like a shape context-based approach, but the difference is that the cell units contained in the C-HOG interval have multiple orientation channels ), the shape-based context method only uses a single edge presence count ).

Interval Normalization(Block normalization)

The authors used four different methods to normalize the intervals and compared the results. IntroductionVIndicates a vector that has not been normalized. It contains information about all histograms of a given range (Block. |VK| IndicatesVThe K-order norm of. Here, K goes to 1 and 2. UseERepresents a small constant. The normalization factor can be expressed as follows:

L2-norm:

L1-norm:

L1-sqrt:

There is also the fourth normalization method: L2-Hys, it can be through the first L2-norm, the results of the truncation (clipping), and then re-normalization to get. The author found that the use of L2-Hys, L2-norm, and L1-sqrt method achieved the same effect, the L1-norm slightly showed a little unreliable. However, for data that has not been normalized, these four methods have shown significant improvements.

SVMClassifier(SVM classifier)

The last step is to input the extracted hog feature into the SVM classifier to find an optimal hyperplane as a decision function. The method used by the author is: use the free svmlight package plus hog classifier to find pedestrians in the test image.

Read the full text

Category:View comments by default

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.