Histogram of oriented gridients, abbreviated as HOG, is one of the most common features of image local texture in computer vision and Pattern recognition field. This characteristic name is also very straightforward, that is, to calculate the image of a region in different directions of the gradient values, and then accumulate, get the histogram, this histogram, it can represent this area, that is, as a feature, can be input into the classifier. Then, let's introduce the concrete principle and calculation method of hog, and some extension.
1. Split image
Because Hog is a local feature, you can't get a good result if you extract features directly from a large picture. The principle is simple. From the information theory point of view, for example, a picture of 640*480, there are about 300,000 pixels, that is, the original data has 300,000-dimensional features, if directly do hog, even according to 360 degrees, divided into 360 bin, also does not represent such a big picture of the ability. From the point of view of feature engineering, in general, only the image area is relatively small, based on statistical principles of the histogram for the region has the ability to express, if the image area is larger, then two completely different images of the hog characteristics, may be very similar. But if the area is small, this possibility is small. Finally, the images are segmented into chunks, and then the hog features are computed for each chunk, which also includes the geometric (positional) characteristics. For example, the positive face, the upper left part of the image block extract hog features are generally consistent with the hog characteristics of the eye.
Next say hog image segmentation strategy, generally there are overlap and non-overlap two kinds, as shown. Overlap refers to segmented chunks (patches) overlap each other, with overlapping areas. Non-overlap refers to blocks that do not overlap, without overlapping areas. These two strategies each have their own advantages.
First said overlap, this segmentation method can prevent the cutting of some objects, or the eye as an example, if the segmentation is just the eye from the middle cut and divided into two patches, after extracting the hog features, which will affect the next classification effect, But if overlap between two patches, then at least one patch will have complete eyes. The disadvantage of overlap is that it is computationally large because the pixels in the overlapping regions need to be computed repeatedly.
Besides Non-overlap, the disadvantage is mentioned above, sometimes will be a continuous object cutting open, get not too "good" hog characteristics, the advantage is the small amount of computation, especially with pyramid (pyramid), this advantage is more obvious.
2. Calculate the directional gradient histogram for each block
After splitting the image, the direction gradient histogram for each patch is calculated next. The steps are as follows:
A. Use any of the gradient operators, such as Sobel,laplacian, to convolution the patch, and calculate the gradient direction and amplitude at each pixel point. The specific formula is as follows:
Where IX and iy represent gradient values in both horizontal and vertical directions, M (x, y) represents the magnitude of the gradient, and θ (x, y) represents the direction of the gradient.
B. Divide 360 degrees (2*PI) into several bins as needed, for example: Split into 12 bins, each bin containing 30 degrees, and the entire histogram containing 12 dimensions, or 12 bins. Then, based on the gradient direction of each pixel point, the amplitude of the histogram is accumulated by bilinear interpolation.
C. (optional) split the image into a larger block and use the block to normalized the color and brightness of each of these patches , This step is mainly used to remove light, shadows and other effects, for the light effect of the image, such as small areas of letters, digital images, can not do this step. It is also mentioned in the paper that this step has little effect on the accuracy of the final classification.
3. Composition Features
The "small" hog features extracted from each patch are concatenated to form a large one-dimensional vector, which is the final image feature. This feature can be sent to the classifier for training. For example: There are 4*4=16 patches, each patch extracts 12-dimensional small hog, then the length of the final feature is: 16*12=192 dimension.
4. Some extended
Combined with pyramid, that is, Phog. Phog refers to dividing the same image into different scales, then calculating the small hog of patches in each scale, and finally connecting them to a long one-dimensional vector as a feature. For example: to a picture of 512*512 first do 3*3 segmentation, then do 6*6 segmentation, and finally do 12*12 segmentation. Next, calculate the small hog for the segmented patch, assuming that 12 bins are 12 dimensions. Then there is the 9*12+36*12+144*12=2268 dimension. It should be noted that the small hog obtained on these different scales must first be normalized, since the values of any one-dimensional hog in the 3*3 scale are likely to be much larger than any one-dimensional value in the 12*12 scale, due to the different sizes of patches. Compared with the traditional hog, Phog can detect the characteristics of different scales and have stronger expression ability. The disadvantage is that the amount of data and computation is much larger than hog.
Reference documents:
Navneet Dalal and Bill Triggs, "histograms of oriented gradients for Human Detection", 2005
A. Bosch, A. Zisserman, and X. Munoz, representing shape with A spatial pyramid kernel, 2007
Histogram of oriented gridients (HOG) directional gradient histogram