Hog has many similarities with the edge histogram, scale-unchanged Feature Transform (SIFT), and shape context (shape contexts), but they differ in the following ways: the hog descriptor is calculated on a grid-intensive Cell Unit with uniform sizes. To improve performance, overlapping local contrast normalization is also adopted. the hog method operates on the local cell units of the image, so it can maintain good immutability for geometric and optical deformation of the image.
Step 1: gamma/colour Normalization
The authors normalize the color and Gamma of the images in the gray space, RGB color space, and LAB color space respectively. However, the experimental results show that the normalized preprocessing has no effect on the final results, the possible reason is that there are also normalization processes in the subsequent steps, which can replace the normalization of this preprocessing. Therefore, this step can be omitted in practical applications.
Step 2: gradient Computation
Several smoothing scales were testedinclude-ing σ = 0 (none) uncentred [? 1, 1], centred [? 1, 0, 1] and cubic-corrected [1 ,? 8, 0, 8 ,? 1] 2 × 2 diagonal 3 × 3 sobelsimple 1-D [? 1, 0, 1] masks at σ = 0 workbestfor color images, wecalculate separate gradients for eachcolour channel, andtake the one with the largest norm as the pixel's gradient vector.
Step 3: spatial/orientation binning
0? -180? ("Unsigned" gradient) or 0? -360? ("Signed" gradient), the author found that using undirected gradients and nine histogram channels can achieve the best effect in pedestrian detection experiments for each cell to calculate the direction gradient histogram, is a 9-dimensional vector that uses cubic Interpolation for voting. Why is cubic Interpolation? [Offset X, Y twice, and angle once. For example, if my angle is 20 degrees, it is allocated 0-20 degrees and 20-40 degrees.] It is useful to downweight pixels near the edges of the block by applying agaussian spatial window to each pixel before accumulating orientation votesinto cells. (σ = 0.5? Blockwidth)
Step 4: grouping the cells together into larger blocks
For example, in the left graph, there will be 4*4 blocks, each of which has 4 cells, so there are (4*4) * (2*2) * 9 features in total
The R-HOG interval is basically a square lattice, which can be characterized by three parameters: the number of cell units in each interval, the number of pixels in each cell unit, and the number of histogram channels in each cell increase the performance by 5%. overlap can eliminate the mutation between blocks.
Step 5: normalization and descriptor Blocks
L2-Hys, L2-norm followed by clipping (limitingthemaximumvaluesof V to 0.2) and renormalizing
Step 6: SVM Training
I use libsvm.
Summary of Algorithms
RGB color space with nogammacorrection ;[? 1, 0, 1] gradient filter with no smoothing; linear gradient voting into 9 orientation bins in 0? -180 ?; 16 × 16 pixelblocks of four 8 × 8 pixel cells; Gaussian spatial win-dow with σ = 8 pixel; L2-Hys (Lowe-styleclipped L2 norm) block normalization; block spacing stride of 8 pixels (hence 4-fold coverage of each cell); 64 × 128detection window; linear SVM classifier.
For a 64*128 window, because cell width = cell height = 8 and block overlap are added, a total of 7*15 blocks have four cells in each block, each cell is a nine-dimensional feature, so each block has 36 dimensions.
Let's calculate the number of features: (64/8-1) * (128/8-1) * 4*9 = 3780
The author also mentioned circular blocks.
E. graph is the feature generated.
According to the f diagram, we can see that for a person, the SVM weight of the contour is relatively large.
INRIA pedestrian library http://pascal.inrialpes.fr/data/human/
Http://hi.baidu.com/nokltkmtsfbnsyq/item/f4b73d06f066cd193a53eec3 of hog + SVM source code timehandle