Image Feature Extraction for Target Detection (1) hog features

Source: Internet
Author: User
Tags svm

1. Hog features:

Histogram of Oriented Gradient (hog) is a feature description sub-statement used for physical examination and detection in computer vision and image processing. It forms a feature by calculating and counting the gradient direction histogram of the Partial Area of the image. Hog feature combined with SVM classifier has been widely used in image recognition, especially in pedestrian detection. It should be noted that the hog + SVM method for pedestrian detection was proposed by Dalal, a French researcher, on cvpr 2005. Although many pedestrian detection algorithms are being proposed, however, it is basically based on the concept of hog + SVM.

(1) main ideas:

In a pair of images, the appearance and shape of a local target can be well described by the Direction density distribution of the gradient or edge. (Essence: gradient statistics, where the gradient mainly exists at the edge ).

(2) The detailed implementation method is as follows:

First, divide the image into small connected areas. We call it cell units. Then, the gradient or edge direction histogram of each pixel in the cell unit is set. Finally, we can combine these histograms to form a feature descriptor.

(3) improve performance:

The following method is used to normalize the local histograms within a larger range of the image (we call it a range or block) to the illuminance (contrast-normalized: calculate the density of each histogram in this interval (Block), and then normalize each cell unit in the interval based on the density. After this normalization, we can achieve better effects on illumination changes and shadows.

(4) Strengths:

Compared with other feature Descriptive methods, hog has many advantages. First, because hog operates on the partial square unit of the image, it can maintain excellent immutability for the geometric and optical deformation of the image, these two types of deformation will only come out of today's larger space field. Secondly, under conditions such as coarse airspace sampling, fine direction sampling, and strong local optical normalization, pedestrians must be able to maintain upright posture in general, allows pedestrians to have some minor physical movements that can be ignored without affecting the detection effect. Therefore, hog features are particularly suitable for human health checks in images.

 

2. Implementation Process of hog Feature Extraction Algorithm:

General process:

The hog feature extraction method is to set an image (you need to check the target or scan form of the tracing ):

1) grayscale (the image is regarded as an X, Y, Z (grayscale) three-dimensional image );

2) The Gamma Correction method is used to standardize the color space (normalization) of the input image. The purpose is to adjust the illumination of the image and reduce the effect of partial shadow and illumination changes, at the same time, it can suppress noise interference;

3) Calculate the gradient (including the size and direction) of each pixel of the image. It is mainly used to capture the contour information and further weaken the interference of illumination at the same time.

4) divide the image into small cells (such as 6*6 pixels/cell );

5) Calculate the gradient histogram of each cell (number of different gradients) to form the descriptor of each cell;

6) construct each cell into a block (for example, 3*3 cells/block). The feature descriptor of all cells in a block is connected together to obtain the hog feature descriptor of the block.

7) concatenate the hog feature descriptor of all blocks in the image to get the hog feature descriptor of the image (the target you want to check. This is the final feature vector available for classification.

 

 

The specific steps of each step are as follows:

(1) Standardized gamma space and Color Space

To reduce the effect of illumination, the entire image must be normalized first ). In the texture intensity of the image, the contribution of local surface exposure is large. Therefore, such compression can effectively reduce the shadow and illumination changes in the image. Because the color information is not very effective, it is usually first converted to a grayscale image;

Gamma compression formula:

For example, Gamma = 1/2 can be used;

 

(2) Calculate the image gradient

Calculate the horizontal and vertical gradient of the image, and calculate the gradient direction value of each pixel position accordingly. The derivation operation not only captures the contour, shadow, and some texture information, it can further weaken the effect of illumination.

The gradient of the pixel (x, y) in the image is:

The most frequently used method is: first use the [-, 1] gradient operator to perform convolution on the original image to obtain the X direction (horizontal direction, with the right as the positive direction) then, use the [,-1] T gradient operator to perform convolution on the original image. Then, the gradient component gradscaly in the Y direction (vertical direction, upward direction and positive direction) is obtained. Then, use the formula above to calculate the gradient size and direction of the pixel.

 

(3) Build a gradient histogram for each cell unit

The third step aims to provide an encoding for the local image area. At the same time, it can maintain a weak sensitivity to the posture and appearance of the human body in the image.

We divide the image into several cell cells. For example, each cell is 6*6 pixels. If we use the histogram of 9 bin to calculate the gradient information of these 6*6 pixels. That is to say, divide the Gradient Direction of the cell into nine direction blocks. For example, if the Gradient Direction of the pixel is 20-40 degrees, add one To the count of the 360 bin of the histogram, in this way, the gradient direction histogram of the cell can be obtained by performing weighted projection (ing to a fixed angle range) on each pixel in the histogram, is the 9-dimension feature vector corresponding to the cell (because there are 9 bin ).

If the pixel gradient direction is used, what about the gradient size? The gradient is used as the projection weight. For example, if the gradient direction of this pixel is 20-40 degrees and the gradient size is 2 (If yes), then the count of the 2nd bin in the histogram is not incremented, but add two (If yes ).

Cell units can be either rectangular or radial ).

 

(4) combine cell units into a large block and normalize the gradient histogram within the block.

Due to changes in local illumination and foreground-background illumination, the intensity of gradient varies greatly. This requires normalization of the gradient intensity. Normalization further compresses illumination, shadows, and edges.

The method taken by the author is to combine various cell units into a large, space-connected interval (blocks ). In this way, all cell feature vectors in a block are connected together to obtain the hog feature of the block. These intervals overlap each other, which means that the features of each cell will appear in the final feature vector multiple times with different results. We call the normalized block descriptor (vector) a hog descriptor.

The interval has two basic geometric shapes-the rectangular interval (R-HOG) and the ring interval (C-HOG ). The R-HOG interval is basically a square lattice which can be characterized by three numbers of squares: number of cell units in each interval, number of pixels in each cell unit, and number of histogram channels for each cell.

For example, the optimal number of channels for pedestrian detection is 3 × 3 cells/interval, 6 × 6 pixels/cell, and 9 histogram channels. Then, the number of features is 3*3*9;

 

(5) collect hog features

The last step is to collect all overlapping blocks in the detection form and combine them into the final feature vectors for classification.

(6) What is the hog feature dimension of an image?

By the way, we will make a summary: the hog feature extraction process proposed by Dalal: Cut the sample image into cells of several pixels and divide the Gradient Direction evenly into nine bins ), histogram statistics are performed on the gradient direction of all pixels in each direction in each unit to obtain a nine-dimensional feature vector. Each adjacent four units form a block ), combine the feature vectors in a block to obtain a 36-dimensional feature vector, scan the sample image using the block, and scan the step size as a unit. Finally, the features of the human body are connected together. For example, for a 64*128 image, every 8*8 pixels form a cell, and every 2*2 cells form a cell. Because each cell has nine features, therefore, each block contains 4*9 = 36 features, with 8 pixels as the step size. Then, there will be 7 scanning forms in the horizontal direction and 15 scanning forms in the vertical direction. That is to say, 64*128 of images have 36*7*15 = 3780 features.

Hog dimension, 16*16 pixel block, 8x8 pixel cell

 

Gaze:

Pedestrian detection hog + SVM

Overall Thinking:
1. Extract Positive and Negative sample hog features
2. Input SVM Classifier Training to obtain the model.
3. A checklist subitem is generated by the model.
4. Use the checklist subchecklist to retrieve negative samples and obtain hardexample
5. Extract the hog features of hardexample and input them into the training based on the features in step 1. Finally, we can get the detection results.

In-depth study of hog algorithm principles:
1. Hog Overview

Histograms of Oriented gradients, as the name suggests, a histogram of Oriented gradients, is a description of a target. Ii. question raised by hog
Hog was proposed by a doctor of Nb IN 05 years. Http://wenku.baidu.com/view/676f2351f01dc281e53af0b2 . Html 3. algorithm understanding
By October, we finally had a sigh of relief and sorted out the hog algorithm flow. First, we need to have a general understanding. Each target corresponds to a one-dimensional feature vector. This vector has an n-dimensional combination. This N is not just a guess, it is rational, let's say, why is the hog checklist in opencv 3781 dimension? This problem is indeed a headache in the early stage and has been entangled for a long time, but don't worry. Let's take a look at the hogdescriptor constructor hogdescriptor (size Winsize, size blocksize, size blockstride, size cellsize ,... (The number of segments below is not used here). Check the default number of segments in opencv. We can see that winsize (64,128), blocksize (16, 16), blockstride (8, 8 ), cellsize (8, 8). Obviously, hog divides a feature form win into many blocks, each cell is divided into many cell units (cell). Hog feature vectors concatenate all the corresponding small features of cells to obtain a high-dimensional feature vector, then, the corresponding dimension feature vector dimension n of this form is equal to the number of cells in block X of the form. X number of feature vectors corresponding to each cell element. Here, we will calculate how 3781 is obtained. The size of the form is 64x128, the block size is 16x16, and the block step size is 8x8, the number of blocks in the form is (64-16)/8 + 1) * (128-16)/8 + 1) = 7*15 = 105 blocks, the block size is 16x16, and the cell size is 8x8. Then, the number of cell cells in a block is (16/8) * (16/8) = 4 cells, here we can see the number of dimensions required at the end. We only need to calculate the corresponding vector of each cellular element. Where is the number of dimensions? Don't worry. We can project each cell element to nine bins? This is stuck for a very long period of time, and we will talk about it later). Then the vector corresponding to each cell element is nine dimensions, and each bin should be a number of nine-dimensional vectors, now let's see if we know all the three requirements for calculating the form dimension, n = Number of blocks in the form X number of cells X indicates the number of feature vectors corresponding to each cell element. Let's take a look at N = 105x4x9 = 3780. This is the corresponding feature of this form. Some people will say why the getdefaultpeopledetector in opencv () What about 3781 dimensions? This is because the other one dimension is a one-dimensional Offset (very collapsed, right? I also crashed for a long time..., next explanation ). We use hog + SVM to detect pedestrians. The final detection method is the main linear discriminant function. wx + B = 0. The 3780-dimension vector we just asked is actually W, with the addition of one-dimensional B, the default 3781 dimension checking operator of opencv is formed. The checking operator is divided into two parts: Train and test, during train, We need to extract the hog features of some column training samples and use SVM for training. The goal is to obtain the W and B of our samples, during the test, the hog feature X of the target to be checked is extracted, and can be identified by bringing the equation into it? **************************************** **************************************** ****************** The gorgeous cutting line is written here. At least I have a rough idea about the hog operation process. On the Internet, I can see a lot of hog calculation methods, such as Shenma normalization and computing gradient, projection of each cell element is the same. For people who are new to contact, they can see perfection, but they just don't know how to use it. How can hog and SVM work together, and those things are useless to our early semester. The advantage is that we will use hog and go back and look at the principles to get the results. If there are a bunch of data online, we won't be able to draw any more here. In addition, it is worth mentioning that when calculating cellular features, we need to project each bin. There are many articles in this projection, which will be mentioned in my senior master's thesis, it is named '3d linear interpolation once '. If you want to know more about hog, you can think about it in detail. **************************************** **************************************** ****************** Continue to perform gorgeous cutting. The following describes the use of libsvm and cvsvm. I think libsvm is better, but cvsvm is also rewritten based on libsvm2.6 (if you remember wrong, the difference between the two is that libsvm training produces a model, while cvsvm is an XML file. When calculating the W vector in the final wx + B = 0, for libsvm, you can directly process the model file, but for cvsvm, you can skip the generated XML file and directly use the attributes in the cvsvm object (this is a bit fuzzy, you can select either of them, but the relationship is not very big.) You are welcome to criticize, correct, and exchange and learn.

 

Image Feature Extraction for Target Detection (1) hog features

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.