Histograms of Oriented gradients)

Source: Internet
Author: User
Tags svm

In previous articles, the concept of hog feature was mentioned in pedestrian counting and counting. In the past two days, I read the original paper and learned about the principles of hog feature, I wrote down the process of this method based on my own understanding. If there are any errors, please correct them.

The basic idea of hog (histograms of Oriented gradients) features: the basic idea is that local Object Appearance and shape can often be characterized rather well by the distribution of local intensity gradients or edge ctions, even without precise knowledge of the corresponding gradient or edge positions. even if you do not know (between images) The exact alignment of the gradient or edge location, the appearance and shape information of the local target can be characterized by the local gradient density or edge direction. The following describes the extraction process of hog features. Hog features are used for pedestrian detection in the essay [1]. In fact, if you read the following steps, you will find that hog is not only used for pedestrian detection, it can also detect dogs, cats, and other similar objects. The job is only to train samples. Therefore, hog features can be called object detection methods.


1. Color normalization (gamma/color normalization)

The author tried RGB, lab, and images in gray space, and found that the results of the RGB and lab images were basically the same, however, the image recognition rate in the gray space is reduced by 1.5%. Therefore, the effects of these color spaces are roughly the same. It is not necessary to Convert RGB to lab or vice versa. Therefore, the normalization of this step can be omitted, but the trial image and the training image must have a color space. This is okay.


2. Calculate the gradient value (gradient computation)

There are many ways to calculate the gradient value in discrete sets. For example, if the target pixel is the center, use the right neighbor value of the center to subtract the left neighbor value or the lower neighbor value to subtract the top neighbor value, in this paper, mask [-, 1] is used for convolution of horizontal or vertical pixels to obtain the gradient x component and Y component of the central pixel respectively. With these two components, it is difficult to determine the gradient value and direction. Of course, the paper also mentions other methods, such as cubic-corrected [1,-, 8,-1], uncentred. The author found that-1, 0, 1] has the best effect.

For color images, the author calculates the gradient values of each color channel, and selects the maximum values of the three channels as the pixel values of this point, instead of simply and roughly converting a color image into a gray image and then finding the gradient value.


3. Construct a histogram (or spatial classification, spatial/orientation binning)

After calculating the gradient value and direction of each vertex, the next step is how to use the gradient. The author considers that the gradient is a vector, so the gradient has a value of 0 to degrees (signed gradient) or a value of 0 to degrees (unsigned gradient ). Taking 20 degrees as a bin (a vertical bar in the histogram), a histogram containing 9 bins can be formed. Wait. If you want to put all the gradient values of the image into a histogram, you are wrong. The author divides the image into small cells, each of which contains 4x4, 6x6... pixels, each cell as a unit, so that each cell can obtain a statistical histogram. This is equivalent to retaining the local features of the image. The experiment showed that 6x6 cells have the best effect, for example, to (1) (ignore block first ). When a gradient is placed in the bin, the contribution of the gradient value to the bin should also be different. For example, the gradient value can be square open. The author finds that the simplest method is the most effective. For example, if the gradient value of a pixel is 10 and the direction is 15 degrees, add the bin of 0 to 20 degrees to 10, that's simple. Before calculating the bin, you still need to consider Step 1 normalization.


Figure (1)


4. Normalization and descriptor Blocks)

Considering that the light intensity in the same image may be different, the gradient changes in different regions may be sharply different, so let's see how the author can eliminate the impact. In the previous article, we talked about the cell composed of pixels. The author makes multiple cells into a block. In a block, the light remains unchanged. Therefore, we normalize the gradient values in the block, for example, if the vector composed of the gradient values of pixels in the block is V = (10, 20, 30), after normalization: V = V/(| v | + E) E is a very small constant. The author has tried four normalization methods. For details, see [1]. It should be noted that in order to further remove the effect of illumination, the positions of each block are cross. For example, a cell may belong to block1 or block2. The authors found that a block contains 3x3 cells, and each cell has 6x6 pixels. The best result is (1 ).


5. Check the detail form (detector and context)

The above steps can get a hog description of an image, but do not forget the original intention: to detect pedestrians. Accurately, the pedestrians in an image are circled. If we compare the whole image with the hog feature, it is not a pedestrian inspection. Because the position of the pedestrian in the image is not fixed, the author proposed the 64 x checksum form concept, sliding the fixed size form on the image, do not slide once for a similarity match, the matching method is in Step 6.


6. Classifier

The author uses SVM as the classifier to obtain the hog Feature Based on the preceding steps and uses SVM to obtain the optimal interval between the two types of pedestrian and non-pedestrian images. When there is an image to be checked, obtain the hog feature of the detection form in step 5, and then use the optimal interval to identify the classification.


For example, to draw a flowchart for the above steps, the basic is:



Opencv has the code to implement hog Pedestrian detection. The effect is roughly as follows (my own picture cannot be found, I can find it online). In fact, the false detection rate is quite high, some false checks can be solved by difference, but the effect on crowded groups is only good.



Exam documents:

[1] navneet Dalal and Bill triggs: histograms of Oriented gradients for human detection

[2] http://www.zhizhihu.com/html/y2010/1690.html

[3] http://hi.baidu.com/nokltkmtsfbnsyq/item/f4b73d06f066cd193a53eec3



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.