Hog characteristics of Image feature extraction in target detection

Last Update:2018-07-20 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article goes from "TSQ"

1. Hog Features:

The directional gradient histogram (histogram of oriented Gradient, HOG) is a feature descriptor used for object detection in computer vision and image processing. It is characterized by calculating and statistic the gradient direction histogram of local region of image. Hog feature combined with SVM classifier has been widely used in image recognition, especially in pedestrian detection. It is to be reminded that the method of pedestrian detection is HOG+SVM French researchers Dalal in 2005 of CVPR, and now although a lot of pedestrian detection algorithms continue to be proposed, but the basic is based on HOG+SVM thinking.

(1) Main ideas:

In an image, the representation and shape of the local target (appearance and shapes) can be well described by the gradient or the direction density distribution of the edges. (Essence: The statistical information of gradients, where gradients exist mainly at the edges).

(2) The specific implementation method is:

First, the image is divided into small connected areas, which we call cell units. The gradient or edge direction histogram of each pixel in the cell is then collected. Finally, these histograms can be combined to form a feature descriptor.

(3) Improve performance:

The local histogram in the larger range of the image (we call it interval or block) for contrast normalization (contrast-normalized), the method is: first calculate the density of each histogram in this interval (block), Then the cells of each cell in the interval are normalized according to the density. With this normalization, you can get better results for light changes and shadows.

(4) Advantages:

Hog has many advantages over other feature description methods. First, since the hog is operated on the local cell of the image, it maintains a good invariance of both the geometric and optical deformations of the image, both of which appear in the larger space domain. Secondly, in the rough airspace sampling, fine direction sampling and strong local optical normalization conditions, as long as the pedestrian generally can maintain upright posture, can allow pedestrians have some subtle limb movements, these subtle actions can be ignored without affecting the detection effect. Therefore, the Hog feature is particularly suitable for human detection in images.

2, the Hog feature extraction algorithm implementation process:

Approximate process:

The Hog feature extraction method is to put an image (the Target or scan window you want to detect):

1) Grayscale (image as a three-dimensional image of X, Y, Z (grayscale));

2) using gamma correction method to standardize the color space of the input image (normalized), the aim is to adjust the contrast of the image, to reduce the shadow and illumination changes caused by the image, and to suppress noise interference;

3) Calculate the gradient (including size and orientation) of each pixel of the image, mainly to capture contour information, and further weaken the illumination interference.

4) Divide the image into small cells (e.g. 6*6 pixels/cell);

5) The descriptor of each cell can be formed by counting the gradient histogram of each cell (the number of different gradients).

6) Each cell is composed of a block (for example, 3*3 cell/block), the characteristics of all cells within a block descriptor concatenated together to get the hog characteristics of the block descriptor.

7) Descriptor The Hog feature of all blocks in image images in series to get the Hog feature descriptor of that image (the target you want to detect). This is the final feature vector that can be used for classification.

the detailed procedures for each step are as follows:

(1) standardize gamma space and color space

In order to reduce the influence of illumination factors, the whole image must be normalized (normalized). In the texture intensity of the image, the proportion of local surface exposure contribution is larger, so the compression processing can effectively reduce the shadow and illumination changes in the image. Because the color information does not function very much, it is usually converted to grayscale.

Gamma Compression formula:

For example, can take gamma=1/2;

(2) Calculate image gradient

Calculates the gradient of the horizontal and vertical direction of the image, and calculates the gradient direction value of each pixel position accordingly; The derivative operation can not only capture contour, silhouette and some texture information, but also weaken the influence of illumination.

The gradient of the pixel point (x, y) in the image is:

The most common method is: first use the [ -1,0,1] gradient operator to the original image convolution operation, the x direction (horizontal direction, to the right is the positive direction) of the gradient component Gradscalx, and then the [1,0,-1]t gradient operator to the original image convolution operation, to get the y direction (vertical direction, The gradient component gradscaly in the positive direction upward. Then use the above formula to calculate the gradient size and direction of the pixel point.

(3) build gradient histogram for each cell unit

The goal of the third step is to provide an encoding for the local image area while maintaining a weak sensitivity to the posture and appearance of the body object in the image.

We divide the image into several cell cells, such as 6*6 pixels per cell. Suppose we use a histogram of 9 bins to count the gradient information for this 6*6 pixel. That is, the cell's gradient direction 360 degrees into 9 direction blocks, as shown: for example: If the gradient direction of this pixel is 20-40 degrees, the histogram of the 2nd bin is added one, so that each pixel in the cell with a gradient direction in the Histogram weighted projection (mapping to a fixed angle range), You can get the histogram of this cell's gradient direction, which is the 9-dimensional eigenvector of the cell (because there are 9 bins).

The pixel gradient direction is used, then the gradient size. The gradient size is the weighted value of the projection. For example: the gradient direction of this pixel is 20-40 degrees, then its gradient size is 2 (assuming AH), then the histogram of the 2nd bin count is not added one, but add two (assuming AH).

Cell cells can be rectangular (rectangular) or star-shaped (radial).

(4) Combining cell units into large blocks (blocks), normalized gradient histogram in a block

Due to the change of local illumination and the change of foreground-background contrast, the variation range of gradient intensity is very large. This requires a normalization of the gradient intensity. Normalization can further compress light, shadows, and edges.

The author's approach is to combine each cell unit into a large, spatially connected interval (blocks). Thus, the hog feature of the block is obtained by concatenating the eigenvectors of all the cells in a block. These intervals overlap, which means that the characteristics of each cell appear multiple times in the final eigenvectors with different results. We will call the Block descriptor (vector) after normalization as the hog descriptor.

The interval has two main geometrical shapes-rectangular interval (r-hog) and annular interval (c-hog). The R-hog interval is largely a square lattice, which can be characterized by three parameters: the number of cell units in each interval, the number of pixels in each cell, and the number of histogram channels per cell.

For example: the best parameter setting for pedestrian detection is: 3x3 cell/interval, 6x6 pixels/cell, 9 histogram channels. The characteristic number of a piece is: 3*3*9;

(5) Collection of hog features

The final step is to collect all the overlapping blocks in the detection window and combine them into the final eigenvectors for hog.

(6) What is the Hog feature dimension of an image?

By the way: Dalal proposed hog feature extraction process: The sample image is divided into a number of pixels of the cell (cell), the gradient direction is divided into 9 intervals (bin), in each cell in the direction of all the gradient direction of all the pixels in the various directions of the histogram statistics, A 9-D feature Vector is obtained, each adjacent 4 units constitute a block (block), the feature vectors within a block are combined to obtain a 36-dimensional eigenvector, a block is used to scan the sample image, and the scanning step is a unit. Finally, the characteristics of the human body are obtained by concatenating the features of all blocks together. For example, for an image of 64*128, each 8*8 pixel consists of a cell, each 2*2 cell consists of a block, because each cell has 9 characteristics, so each block has a 4*9=36 feature, with 8 pixels in stride, then the horizontal direction will have 7 scan windows, There will be 15 scan windows in the vertical direction. That is to say, 64*128 's picture, altogether has 36*7*15=3780 characteristic.

Hog dimension, 16x16 pixel block,8x8 pixel cell

Note: Pedestrian detection HOG+SVM

General idea:
1. Extracting hog characteristics of positive and negative samples
2, input SVM classifier training, get model
3, the model to generate the detection sub-
4, using the detector to detect negative samples, get hardexample
5. Extracting the hog characteristics of hardexample and combining the characteristics of the first step into training to get the final detector.

In-depth study of hog algorithm principles:
I. Hog overview histograms of oriented gradients, as the name implies, a directional gradient histogram, is a description of the goal of the way, both descriptors. Second, Hog put forward
Hog is a 05 of a NB Dr. proposed, the paper link http://wenku.baidu.com/view/676f2351f01dc281e53af0b2.html three, algorithm understanding
finally to October, finally can be relieved, tidy up the hog algorithm flow. First of all to have a whole understanding, each target corresponds to a one-dimensional eigenvector, the vector has n-dimensional, this n is not a blind guess, is a well-reasoned, for example, why OPENCV self-hog detection is 3781-dimensional. This problem is really a headache in the early days, tangled up a long time, but don't worry, we first look at OpenCV in the hogdescriptor structure of the constructor Hogdescriptor (size winsize,size Blocksize,size blockstride,size cellsize,... (The following parameters are not used here), to check the OPENCV default parameters we can see, winsize (64,128), BlockSize (16,16), Blockstride (8,8), cellsize (8,8), Obviously hog is to divide a feature window win into a lot of block blocks, in each block is divided into a lot of cell cell cells (i.e., cell), hog eigenvector is not only these all the cell corresponding small feature string up to get a high-dimensional eigenvector, Then this window corresponds to the one-dimensional eigenvector dimension n is equal to the number of blocks in the window the number of cells in the X block x each cell corresponding to the number of eigenvectors. Write here, we calculate 3781 how to get, window size 64x128, block size 16x16, block step 8x8, then the number of blocks in the window is ((64-16)/8+1) * ((128-16)/8+1) = 7*15 = 105 blocks, the block size is 16x16, Cell size is 8x8, then the number of cell cells in a block is (16/8) * (16/8) = 4 cells, and here we can see that the last required dimension n, only need to calculate each cell corresponding to the vector, where is this parameter. Don't worry, we're projecting every cell into 9 bins (how to project. Here the card for a long time, the back will say), then each cell corresponds to a vector is 9-dimensional, each bin pair should be a 9-dimensional vector of a number, now see is not the calculation window dimension of the three demand is known, n = the number of blocks in the window of the number of cells in the X block x The number of eigenvectors corresponding to each cell, take a look at n= 105x4x9 = 3780, which is the corresponding feature of this window. Some people will say, why OpenCV in the GetdefaultpeopledetectoR () has 3781 dimensions. This is because the other dimension is a one-dimensional offset, (it's crashing, I've been crashing for a long time ...) , the next paragraph explains). We use HOG+SVM to detect pedestrians, the final detection method is the most basic linear discriminant function, WX + b = 0, just the 3780-dimensional vector is actually w, and added a one-dimensional B to form the OPENCV default 3781-dimensional detection operator, and detection is divided into train and test two parts, During train we need to extract some of the hog features of the sample training samples using SVM training The ultimate goal is to get the W and b that we detect, to extract the Hog feature X of the target to be detected during test, and whether the equation can be judged. Gorgeous split line written here, at least for the hog operation process has a general understanding of the online can see a lot of hog calculation method, God horse normalized, calculate the gradient, the projection of each cell, a uniform, For people who just started contact, read it as if they understand, but just do not know how to use, hog and SVM How to cooperate, and those things to our early semester completely useless, the advantage is will use hog, and then go back to see the principle, only harvest, those information online a bunch, here is not the lily. In addition, it is worth mentioning that in the calculation of cell characteristics, the need to the various bin projection, the projection inside great article, brother graduation thesis mentioned, named ' three-dimensional linear interpolation ', if you want to learn more about hog can be carefully pondering. continue the gorgeous segmentation below say the use of LIBSVM and CVSVM, I think libsvm better use, but CVSVM is based on libsvm2.6 (remember wrong words) rewrite, The difference between the two is that LIBSVM training is a model, and CVSVM is an XML file, in the last wx+b=0 in the calculation of the W vector, for LIBSVM directly processing model file, but for CVSVM can skip the production of XML files, Directly using the attributes in the CVSVM object (here is a bit vague, the two choose one, the relationship is not very big) welcome criticism, exchange of learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More