Explanations of hog code

Source: Internet
Author: User

From: http://hi.baidu.com/nokltkmtsfbnsyq/item/e1328f39b3abedb0623aff52

When I did a job and research last year, I read the hog paper and did not understand it. So I turned to the opencv code and did not understand the result. After asking for help on the internet, I had to dig my head for two weeks. Finally, I wrote the MATLAB program histograms of Oriented gradients (hog) feature.
MATLAB computing, my code is relatively straightforward, but the work is not powerful, very slow. You have asked some questions, some of which are similar. I have summarized the answer and written it into this document. Although I would like to use a picture to clearly describe the problem, I can only use the picture board and Photoshop three-legged cat to this extent. Also, I am not doing Pedestrian detection... I just happened to touch hog...

 

The process of globalinterpolate in the Code is written without understanding hog, Which is simpler than the original hog idea. Localinterpolate is implemented according to the original hog, which is a method of implementing Na into ve.

 

About calculating the gradient direction angle:

First, perform convolution on the original image using the [-, 1] gradient operator to obtain the gradient component gradscalx in the X direction (horizontal direction, with the right positive direction), and then use, -1] 'gradient operator performs convolution on the original image to obtain the gradient component gradscaly in the Y direction (vertical direction, upward direction. When gradscalx> = 0, gradscaly> = 0, it indicates that the gradient direction is toward the First quadrant. When gradscalx> = 0, gradscaly <0, it indicates that the gradient direction is toward the second quadrant. In this case, combined with quadrant information, the arctangent function atan can be used to find the correct gradient angle under the condition of signed and unsigned.

 

About the scanning loop (layer-4 for loop... Is it faster? Yes! However, I am not competent enough .. If it was not compiled at the time, we had to come to Layer 4 ):

Assume that the detection window is 64 (column) * 128 (ROW), and the block size is 16*16. Each block is divided into four cells, each time the block slides 8 pixels (that is, the width of a cell), and the gradient direction is divided into 9 intervals, ranging from 0 ~ For statistics within the range of 180 degrees, the following descriptions all the preceding assumptions are used as an example.

Btly and btlx represent the coordinates at the point in the upper left corner of the block location, respectively. For the preceding assumptions, there will be 105 blocks in a detection window, so the coordinates in the upper left corner of the first block are (), and the second is )..., The last part of this line is the coordinate of the upper left corner of the block (). Then, the next block needs to slide down eight pixels and return to the far left. At this time, the coordinate of the upper left corner of the block is ), then block and start a new horizontal slide... In this case, the coordinates of the last block in the detection window are (49,113 ).

Each time the block slides to a new position, it needs to stop and calculate the gradient histogram of the four cells in it. (BJ, Bi) is used to store the coordinates in the upper left corner of the Cell (the cell coordinates are based on the upper left corner of the block ).

(J, I) indicates the coordinates of pixels in the cell in the entire detection window (64*128 of the image. in addition, I have jorbj and iorbi in the program, which is in the localinterpolate (that is, the standard original hog situation), that is, BJ and Bi.

 

About hist3dbig:

This is a three-dimensional matrix used to store three-dimensional histograms. The most common one-dimensional histogram is like this,

What about two-dimensional histograms? In this way, one column is a statistical bin, and the column height indicates the size of the statistical value.


What about the three-dimensional histogram? This is how it looks. A small lattice is a three-dimensional one. Each small lattice is a statistical bin, which is used to hold the statistical value. In the preceding example, the histogram of a block is as follows:

 

For linear interpolation, when linear interpolation is used, a statistical value needs to be "allocated in a certain proportion" to the nearest range of the statistical point. The following figure shows a one-dimensional histogram, the statistical point falling within the range marked by the dotted line. The nearest neighbor interval is the two intervals marked with red dots.

If it is a two-dimensional histogram, it falls into the statistical points in the dotted rectangle below. The four statistical intervals around it are the nearest neighbor intervals. The dotted rectangle consists of 1/4 of each of the four statistical intervals.

A three-dimensional histogram has eight nearest neighbor intervals for a statistical point. For example, you can imagine that, only when this statistical point falls within a cube consisting of 1/8 of the following eight statistical intervals, these eight intervals are the nearest neighbor of the statistical points.

How to assign weights during statistics? A one-dimensional histogram is used to briefly describe the meaning of linear interpolation. For the statistical values of the green small square point (x) below, we assume that the center locations of the two bins of the red points are x1, x2, so for X, its distribution weight is left bin: 1-(x-x1)/s, I .e. 1-A/S = B/S, right bin: 1-(x2-x)/s, that is, 1-B/S = A/S.

Similarly, for a three-dimensional histogram, the cumulative formula for Statistics (captured from Dalal's paper) is:

Above, W is the statistical value to be allocated. (X1, Y1, Z1 )... A total of eight points represent the center coordinates of the eight statistical intervals. The above formula uses the Mark H (x1, Y1, Z1) to represent the statistical intervals to be accumulated. This formula is used in programming, but I use the bin label to represent the bin block, just as in the preceding 3D histogram diagram (Binx = 1, biny = 2, bin θ = 9), but in the program, the θ axis is represented by the Z axis.

Binx1 = floor (jorbj-1 + cellpw/2)/cellpw) + 1;

Biny1 = floor (iorbi-1 + cellph/2)/cellph) + 1;

Binz1 = floor (go + (or * PI/nthet)/2)/(or * PI/nthet) + 1;

Binx2 = binx1 + 1;

Biny2 = biny1 + 1;

Binz2 = binz1 + 1;

These statements are used to calculate the coordinates of the center points of the eight statistical intervals.

... The character count in a single article is too high. Why are you using Baidu? Another article titled explanation of hog code (2)

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Next, I will explain the hog code (1).

Before calculating the central coordinates of the statistical interval mentioned above and assigning the weight value, I want to make the program simple when processing the edge, A layer is added to the outside of the stereo histogram of 2*2*9 to form a three-dimensional histogram of 4*4*11 (as shown below ), the original 2*2*9 histogram is the middle part of the package. In this way, in the original histogram, the coordinates are (Binx = 1, biny = 2, binz = 9) bin, and in the new histogram, the coordinates are (Binx = 2, biny = 3, binz = 10 ).

The histogram of 4*4*11 on the above shows a section parallel to the xoy plane:

The rough box is the section of the original 3D histogram, that is, a block. For points falling between the rough box and the rough box, the nearest neighbor range is not 8, I save some effort to write programs ..., this circle bin is expanded, so there are eight intervals in the statistical points between the coarse-box and the coarse-box. When Matlab is used for programming, the four-layer for loop only writes the eight cumulative formulas, and does not need to judge whether it is located in the area between the rough box and the rough dotted box above. In the program, the histogram of 2*2*9 is hist3d, And the histogram of 4*4*11 is hist3dbig. when the calculation in this hist3dbig is complete, I strip the outer circle, that is, hist3d.


With these preparations, I can calculate the Gradient Direction amplitude of the current pixel to which eight bin blocks of hist3dbig are accumulated. Binx1, biny1, and binz1 are the subscript of the bin block closest to the pixels to be counted in the histogram of the eight bin blocks. Binx2, biny2, and binz2 correspond to the subscript of the farthest bin block. X1, Y1, and Z1 are the actual pixel location (x1, Y1) and Gradient Direction (Z1) corresponding to the bin block (binx1, biny1, binz1 ). I still use the upper left corner of the original block (that is, the block before expansion) as the origin of X1 and Y1, because MATLAB uses 1 as the index of image pixels, I think that the origin is (), and the extended part on the left of () is given 0,-1,-2,-3... The coordinates are similar to those above, as shown in. (1, 1) are red points, and the coordinates at the Blue Points are (-3.5, 1 ).


 

The subscript of the expanded green block is (Binx = 1, biny = 1, binz1 = 1), because the pixel coordinate is () at the red point ), the yellow block is the first cell of the block, which corresponds to the subscript () of the bin block ). because of the subscript design, I reduced 1.5 for X1, Y1, and Z1 instead of 0.5.

X1 = (binx1-1.5) * cellpw + 0.5;

Y1 = (biny1-1.5) * cellph + 0.5;

Z1 = (binz1-1.5) * (or * PI/nthet );

In the formula above, X1 and Y1 are added with 0.5, because the pixel coordinates are discrete, and the first coordinates always start from 1, so that the center of the first cell (black spot) it should be 4.5. z1 does not add 0.5 because the angle value starts from 0 and is continuous.

 

In the case of signed (that is, the gradient direction is from 0 degrees to 360 degrees), because in fact the angle of the voting area is the first and end of the ring, if the statistical interval is 40 degrees, then 0-40 degrees and 320-360 degrees are adjacent intervals. In the histogram of 4*4*11, binz = 11 (equivalent to 360-380 degrees) the value of should be returned to binz = 2 (0-40 degrees), and the value for binz = 1 should be returned to binz = 10, as shown in the 4*4*11 histogram

If OR = 2

Hist3dbig (:,:, 2) = hist3dbig (:,:, 2) + hist3dbig (:,:, nthet + 2 );

Hist3dbig (:,:, (nthet + 1) = hist3dbig (:,:, (nthet + 1) + hist3dbig (:,:, 1 );

End

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.