Image Feature Extraction for Target Detection (1) hog features

Last Update:2018-12-03 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Hog features:

Histogram of Oriented Gradient (hog) is a feature description used for Object Detection in computer vision and image processing. It forms a feature by calculating and counting the gradient direction histogram of the Partial Area of the image. Hog feature combined with SVM classifier has been widely used in image recognition, especially in pedestrian detection. It should be noted that the hog + SVM method for pedestrian detection was proposed by French researchers Dalal at cvpr 2005. Although many pedestrian detection algorithms are constantly proposed, however, it is basically based on the concept of hog + SVM.

(1) main ideas:

In an image, the appearance and shape of a local target can be well described by the Direction density distribution of the gradient or edge. (Essence: gradient statistics, where the gradient mainly exists at the edge ).

(2) The specific implementation method is as follows:

First, divide the image into small connected areas. We call it cell units. Then, the gradient or edge direction histogram of each pixel in the cell unit is collected. Finally, combine these histograms to form a feature descriptor.

(3) improve performance:

The following method is used to normalize the local histograms within a larger range of the image (we call it a range or block) By contrast normalization (contrast-normalized: calculate the density of each histogram in this interval (Block), and then normalize each cell unit in the interval based on the density. After this normalization, we can achieve better effects on illumination changes and shadows.

(4) Advantages:

Compared with other feature description methods, hog has many advantages. First, because hog operates on the partial square unit of the image, it can maintain good immutability for the geometric and optical deformation of the image, these two types of deformation only appear in a larger space field. Secondly, in conditions such as coarse airspace sampling, fine direction sampling, and strong local optical normalization, As long as pedestrians can maintain upright posture, it allows pedestrians to have some minor physical movements, which can be ignored without affecting the detection effect. Therefore, hog features are particularly suitable for human detection in images.

2. Implementation Process of hog Feature Extraction Algorithm:

General process:

The hog feature extraction method is to set an image (the target or scan window you want to detect ):

1) grayscale (the image is regarded as an X, Y, Z (grayscale) three-dimensional image );

2) The Gamma Correction method is used to standardize the color space (normalization) of the input image. The purpose is to adjust the contrast of the image and reduce the effect of partial shadow and illumination changes, it can also suppress noise interference;

3) Calculate the gradient (including the size and direction) of each pixel of the image. It is mainly used to capture contour information and further weaken the interference of illumination.

4) divide the image into small cells (for example, 6*6 pixels/cell );

5) Calculate the gradient histogram of each cell (number of different gradients) to form the descriptor of each cell;

6) construct each cell into a block (for example, 3*3 cells/block). The feature descriptor of all cells in a block is connected together to obtain the hog feature descriptor of the block.

7) concatenate the hog feature descriptor of all blocks in the image to get the hog feature descriptor of the image (the target you want to detect. This is the final feature vector for classification.

The detailed process of each step is as follows:

(1) Standardized gamma space and Color Space

To reduce the effect of illumination, the entire image must be normalized first ). In the texture intensity of the image, the contribution of local surface exposure is large. Therefore, this compression can effectively reduce the shadow and illumination changes in the image. Because the color information is not very effective, it is usually first converted to a grayscale image;

Gamma compression formula:

For example, Gamma = 1/2 can be used;

(2) Calculate the image gradient

Calculate the horizontal and vertical gradient of the image, and calculate the gradient direction value of each pixel position accordingly. The derivation operation not only captures the contour, shadow, and some texture information, it can further weaken the effect of illumination.

The gradient of the pixel (x, y) in the image is:

The most common method is: first use the [-, 1] gradient operator to perform convolution on the original image to obtain the gradient component gradscalx in the X direction (horizontal direction, in the right direction, then, a convolution operation is performed on the original image using the [,-1] T gradient operator to obtain the gradient component gradscaly in the Y direction (vertical direction, upward direction. Then, use the formula above to calculate the gradient size and direction of the pixel.

(3) Build a gradient histogram for each cell unit

The third step aims to provide an encoding for the local image area, while maintaining the weak sensitivity to the posture and appearance of the human body objects in the image.

We divide the image into several cell cells. For example, each cell is 6*6 pixels. Suppose we use the histogram of 9 bin to calculate the gradient information of these 6*6 pixels. That is to say, divide the Gradient Direction of the cell into nine direction blocks. For example, if the Gradient Direction of the pixel is 20-40 degrees, add one To the count of the 360 bin of the histogram, perform weighted projection (ing to a fixed angle range) on each pixel in the cell using the gradient direction in the histogram to obtain the gradient direction histogram of the cell, is the 9-dimension feature vector corresponding to the cell (because there are 9 bin ).

If the pixel gradient direction is used, what about the gradient size? The gradient is used as the projection weight. For example, if the gradient direction of this pixel is 20-40 degrees and the gradient size is 2 (Suppose), then the count of the 2nd bin in the histogram is not incremented, instead, add two (assuming ).

The cell unit can be a rectangle (rectangular) or a star (radial ).

(4) combine cell units into a large block and normalize the gradient histogram within the block.

Due to changes in local illumination and foreground-background contrast, the intensity of gradient varies greatly. This requires normalization of the gradient intensity. Normalization further compresses illumination, shadows, and edges.

The method taken by the author is to combine various cell units into a large, space-connected interval (blocks ). In this way, the feature vectors of all cells in a block are connected together to obtain the hog feature of the block. These intervals overlap each other, which means that the features of each cell appear in the final feature vector multiple times with different results. We call the normalized block descriptor (vector) a hog descriptor.

The interval has two major geometric shapes-the rectangular interval (R-HOG) and the ring interval (C-HOG ). The R-HOG interval is basically a square lattice, which can be characterized by three parameters: number of cell units in each interval, number of pixels in each cell unit, and number of histogram channels for each cell.

For example, the optimal parameter settings for pedestrian detection are 3 × 3 cells/interval, 6 × 6 pixels/cell, and 9 histogram channels. Then, the number of features is 3*3*9;

(5) collect hog features

The last step is to collect all overlapping blocks in the Detection Window and combine them into the final feature vectors for classification.

(6) What is the hog feature dimension of an image?

By the way, we will make a summary: The process of hog Feature Extraction proposed by Dalal: dividing the sample image into cells of several pixels and dividing the Gradient Direction evenly into nine bins ), histogram statistics are performed on the gradient directions of all pixels in each unit to obtain a nine-dimensional feature vector. Each adjacent four units form a block ), combine the feature vectors in a block to obtain a 36-dimensional feature vector, scan the sample image using the block, and scan the step size as a unit. Finally, all the features of the block are connected together to obtain the features of the human body. For example, for 64*128 images, every 8*8 pixels form a cell, and every 2*2 cells form a cell, because each cell has nine features, therefore, each block contains 4x9 = 36 features, with 8 pixels as the step size. Then, there will be 7 scan windows in the horizontal direction and 15 scan windows in the vertical direction. That is to say, 64*128 of images have a total of 36*7*15 = 3780 features.

Hog dimension, 16*16 pixel block, 8x8 pixel cell

Note:

Pedestrian detection hog + SVM

Overall Thinking:
1. Extract Positive and Negative sample hog features
2. Input SVM Classifier Training to obtain the model.
3. A detection subitem is generated by the model.
4. Obtain hardexample by detecting negative samples.
5. Extract the hog features of hardexample and input them into the training based on the features in step 1 to obtain the final detection sub.

In-depth study of hog algorithm principles:
1. Hog Overview

Histograms of Oriented gradients, as the name suggests, a direction gradient histogram is a description of a target, which is both a description sub. Ii. question raised by hog
Hog was proposed by a doctor of Nb IN 05 years. Http://wenku.baidu.com/view/676f2351f01dc281e53af0b2 . Html3. algorithm understanding
Finally, in October, we could just breathe a sigh of relief and sort out the hog algorithm flow. First, we need to have a general understanding that each target corresponds to a one-dimensional feature vector, which has a total of n-dimensional dimensions. This N is not just a guess, it is rational, for example, why is the hog detection sub of opencv 3781-dimension? This problem is indeed a headache in the early stage and has been entangled for a long time, but don't worry. Let's take a look at the hogdescriptor Constructor (size Winsize, size blocksize, size blockstride, size cellsize ,... (The following parameters are not used here). Check the default opencv parameters. We can see that winsize (64,128), blocksize (), blockstride ), cellsize (8, 8). Obviously, hog divides a feature window win into many blocks, and each block is divided into many cell units (cell ), the hog feature vector concatenates all the small features corresponding to the cell to obtain a high-dimensional feature vector. Then, the dimension n of the one-dimensional feature vector corresponding to the window is equal to the number of parts in the window.
Number of cells in block x X number of feature vectors corresponding to each cell element.Here, we will calculate how 3781 is obtained. The window size is 64x128, the block size is 16x16, and the block step size is 8x8, the number of blocks in the window is (64-16)/8 + 1) * (128-16)/8 + 1) = 7*15 = 105 blocks, the block size is 16x16, and the cell size is 8x8. The number of cell cells in a block is
(16/8) * (16/8) = 4 cellular elements. here we can see that the number of dimensions N is required at the end. We only need to calculate the vector corresponding to each cellular element, where is this parameter? Don't worry. We can project every cell element to nine bins (how to project? It takes a long time, and we will talk about it later). Then the vector corresponding to each cell element is nine dimensions, and each bin is a number of nine-dimensional vectors, now let's see if we know all the three requirements for calculating the window dimension, n = Number of blocks in the window
Number of cells in block x X indicates the number of feature vectors corresponding to each cell element. Let's take a look at N = 105x4x9 = 3780. This is the feature corresponding to this window. Some people will say why the getdefaultpeopledetector in opencv () What about 3781 dimensions? This is because the other one dimension is a one-dimensional Offset. (It crashes for a long time ).We use hog + SVM to detect pedestrians. The final detection method is the most basic linear discriminant function. wx + B = 0. The 3780-dimension vector we just asked is actually W, with the addition of one-dimensional B, the default 3781-Dimensional Detection Operator of opencv is formed. detection is divided into train and test, during train, We need to extract the hog features of some column training samples and use SVM for training. The ultimate goal is to obtain the W and B we have detected, during the test, the hog feature X of the target to be detected is extracted, and can be identified by bringing the equation into it? **************************************** **************************************** ****************** Gorgeous split lineHere, I have at least a rough understanding of the hog operation process. On the Internet, I can see many hog computing methods, such as Shenma normalization, computing gradient, and projection of each cell element, it seems that people who are new to us can understand it, but they just don't know how to use it, How hog and SVM work together, and those things are useless to our early semester, the advantage is that we will use hog and then look back at the principles to get the results. If there are a bunch of data on the Internet, we won't be able to draw any more here. It is also worth mentioning that when calculating cellular features, we need to project each bin. There are many articles in this projection, which will be mentioned in my senior master's thesis, it is named '3d linear interpolation once '. If you want to know more about hog, you can think about it carefully. **************************************** **************************************** ****************** Continue to the gorgeous splitNext, let's talk about the use of libsvm and cvsvm. I think libsvm is better, but cvsvm is also rewritten based on libsvm2.6 (if I remember it wrong, the difference between the two is that libsvm training produces a model, while cvsvm is an XML file. When calculating the W vector in the final wx + B = 0, for libsvm, you can directly process the model file, but for cvsvm, you can skip the generated XML file and directly use the attributes in the cvsvm object (this is a bit fuzzy. You can select either of them, the relationship is not very big.) You are welcome to criticize, correct, exchange, and learn.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More