Gradient histogram feature (HOG) is a dense descriptor for local overlapping areas of an image, which is characterized by calculating the gradient direction histogram of the local area. Hog feature combined with SVM classifier has been widely used in image recognition, especially in pedestrian detection. It is to be reminded that the method of pedestrian detection is HOG+SVM French researchers Dalal in 2005 of CVPR, and now although a lot of pedestrian detection algorithms continue to be proposed, but the basic is based on HOG+SVM thinking.
Hog is a kind of local area descriptor, which can describe the edge of human body by calculating the histogram of gradient direction on the local area to form the human character. It is insensitive to light changes and a small amount of offsets.
The gradient of the pixel point (x, y) in the image is
Dalal proposed hog feature extraction process: The sample image is divided into a number of pixels (cell), the gradient direction is divided into 9 intervals (bin), in each cell in the face of all the gradient direction of all the pixels in each direction of the histogram statistics, to obtain a 9-dimensional eigenvector, Each adjacent 4 units constitute a block, the feature vectors in a block are combined to obtain a 36-dimensional eigenvector, the sample image is scanned with blocks, the scanning step is a unit. Finally, the characteristics of the human body are obtained by concatenating the features of all blocks together. For example, for 64*128 images, each 2*2 unit (16*16 pixels) constitutes a block with 4*9=36 features within each block, with 8 pixels in stride, then there will be 7 scan windows in the horizontal direction and 15 scan windows in the vertical direction. That is to say, 64*128 's picture, altogether has 36*7*15=3780 characteristic.
In the pedestrian detection process, in addition to the above mentioned Hog feature extraction process, but also includes color map to grayscale, brightness correction and other steps. To summarize, in pedestrian detection, the steps of hog feature calculation:
(1) Convert the input color map into a grayscale image;
(2) Using gamma correction method to standardize the color space of the input image (normalized), the aim is to adjust the contrast of the image, to reduce the shadow and light changes in the image of the impact of the change, but also to suppress noise interference;
(3) Calculate the gradient, mainly to capture the contour information, while further weakening the light interference.
(4) Projecting the gradient to the gradient direction of the element; The purpose is to provide an encoding for the local image region,
(5) Normalization of all cells on blocks, normalization can further compress illumination, shadows, and edges, usually, each cell is shared by several different blocks, but its normalization is based on different blocks, so the results are not the same. Therefore, the characteristics of a cell appear in the final vector multiple times with different results. We will refer to the block descriptor after normalization as the hog descriptor.
(6) Collect the hog characteristics of all blocks in the detection space; This step is to collect all the overlapping blocks in the detection window and combine them into the final eigenvectors for use as a hog feature.
Before the article pedestrian count, a reference to the concept of hog features, these two days to see the original paper, understand the principle of hog characteristics, and according to their own understanding of the process of writing down, if there is no place to welcome correct.
HOG (histograms of oriented gradients) features the basic idea: the basic ideas is that local object appearance and shape can often be charact Erized rather well by the distribution of local intensity gradients or edge directions, even without precise knowledge of The corresponding gradient or edge positions. Even if the exact alignment gradient or edge position is not known, the appearance and shape information of the local target can also be characterized by local gradient density or edge direction. The following is a hog feature extraction process. Reference [1] in the hog features used as pedestrian detection, in fact, if you read the following steps, you will find that hog can not only be used to detect pedestrians, but also can detect dogs, cats and so on almost any object, the work is only a different training samples, so the hog feature can be called the object detection method.
1, Color normalization (gamma/color normalization)
The author tested RGB, lab and gray space images, found that the results of the test images in the RGB and lab space are basically the same, and the image recognition rate in the gray space decreases 1.5%. So the effect of these color space is basically the same, there is no need to convert RGB to lab or vice versa. So the normalization of this step can be omitted, but the test image and training of the image must be a color space, this is no problem.
2. Calculate the gradient value (Gradient computation)
There are many ways to calculate gradient values on discrete sets, for example, the center of the pixel to be measured, the center of the right neighbor minus the their neighbourhood value or the next neighbor minus the adjacent value, in the thesis is used mask[-1,0,1] to the horizontal or vertical pixels of the convolution, respectively, you can get the gradient x component and Y component of the center pixel. With these two components, the gradient value and direction are not difficult to determine. Of course, the paper also mentions other methods, such as cubic-corrected[1,-8,0,8,-1],uncentred[-1,1] and so on. The authors found that -1,0,1] had the best effect.
For the color image, the author calculates the gradient value of each color channel, selects the maximum value of three channels as the pixel value of the point, instead of simply brutally converting the color image to grayscale image and then finding the gradient value.
3. Build histograms (or spatial grading, spatial/orientation Binning)
The gradient value and direction of each point are calculated, and the next step is how to use the gradient. The author considers that the gradient is a vector, so the gradient will have a direction of 0-360 degrees (signed gradient), or a value of 0-180 degrees (unsigned gradient). With 20 degrees as a bin (a vertical bar in the histogram), a histogram with 9 bins can be formed. Wait, if you want to put all the gradients in the image into a histogram, you're wrong. The author divides the images into small cells, each containing 4x4,6x6 ... pixels, each cell as a unit, so that each cell can get a statistical histogram. This is equivalent to preserving the local characteristics of the image. It has been experimentally found that the 6x6-sized cell works best, such as (1) (ignoring block first). When the gradient is placed in the bin, the contribution to the bin according to the different gradient values should be different, for example, the gradient can be squared and so on. After testing the author found the simplest is the most effective, such as a pixel gradient value of 10, the direction of 15 degrees, then 0-20 degrees of bin plus 10 is good, it is so simple. However, before you calculate the bin, you need to consider the 4th step normalization.
Figure (1)
4. Interval Normalization (normalization and descriptor Blocks)
Considering that the intensity of illumination in the same image may be different, it is bound to cause different gradients in different regions to vary dramatically, to see how the authors eliminate the effects. The previous article refers to the pixel-composed cell, the author will make a block of multiple cells, in a block can be considered as the illumination is constant, so within the block gradient values are normalized, such as the number of pixels within the block gradient of the vector is v= (10,20,30), Then normalized: v=v/(| | v| | +e) where E is a very small constant, the author tests four normalization methods that can be viewed in the paper [1]. It is important to note that in order to further remove the effects of illumination, each block has a cross-location, such as a cell that may belong to Block1 and belongs to Block2. After testing, the author found that a block has a 3x3 cell, each cell has a 6x6 pixel effect is best as (1).
5. Inspection window (Detector and Context)
The above steps can get a picture of the hog description, but do not forget the original intention: detection of pedestrians. It is accurate to say that an image of the pedestrian circle out, if the entire image to take the hog characteristics of the comparison is not pedestrian detection. Because the pedestrian in the position of the image is not fixed, so the author put forward the concept of 64x128 detection window, the fixed-size window on the image to slide, did not slide once to do a similarity match, matching method in step 6.
6. Classifier
Using SVM as the classifier, the pre-calibrated pedestrian and non-pedestrian images are obtained according to the above steps to obtain the hog features, and the optimal interval between the two classes is obtained by SVM. When there is an image to be detected, obtain the hog feature of the detection window in step 5, and then use the optimal interval to make classification discrimination.
If you draw a flowchart like the previous step, it is basically:
OpenCV in the implementation of hog pedestrian detection code, the effect is generally like (their own figure can not find, in the online search), in fact, the rate is still very high, some false detection can be solved by differential, but in the crowd effect can only hehe.
Hog (histogram of oriented gradients) is one of the most important features of pedestrian detection and target detection, but its operation speed is comparatively slow. This procedure is hog the SSE realizes accelerates, very has the value.
Derivation of http://blog.csdn.net/wangningbo128/article/details/6426195 theory
http://blog.csdn.net/smartempire/article/details/24038355
Http://www.csdn.net/tag/%25E8%25A1%258C%25E4%25BA%25BA%25E6%25A3%2580%25E6%25B5%258B
HTTP://DOWNLOAD.CSDN.NET/DETAIL/SJTUIPPR/7222487 Hog See accelerated implementation
http://blog.csdn.net/tianmochao13/article/details/26380481 Writing Papers
Hog of target detection (histogram of gradient direction)---hog introduction 0