Hog characteristics of Image feature extraction from target detection

Source: Internet
Author: User
Tags svm

1.HOG Features:

The directional gradient histogram (histogram of oriented Gradient, HOG) is a feature descriptive narrative used for physical examination in computer vision and image processing. It is characterized by calculating and statistic the gradient direction histogram of local region of image. Hog feature combined with SVM classifier has been widely used in image recognition, especially in the pedestrian detection of the great success. It is to be recalled thatHOG+SVM 's approach to pedestrian detection was proposed by French researchers Dalal in 2005 's CVPR , and now, despite the constant advances in the number of pedestrian detection algorithms, But the basic is based on the idea of HOG+SVM .

(1) main ideas:

In an image, the representation and shape of local targets (appearanceand shapes) can be described in a very good way by the distribution of the direction density of gradients or edges. (Essence: The statistical information of gradients, where gradients exist mainly at the edges).

(2) The detailed implementation method is:

First, the image is divided into small connected areas, which we call cell units. Then the histogram of the direction of the gradient or edge of each pixel in the cell cell is set. Finally, the combination of these histograms can constitute a feature descriptive narrator.

(3) Improve performance:

The local histogram in the larger range of the image (we call it interval or block) for the Control degree normalization (contrast-normalized), the method used is: First calculate the histogram in this interval ( block), and then the cells in the interval are normalized according to the density. With this normalization, you can get better results for light changes and shadows.

(4) Advantages:

Compared with other characteristic descriptive narrative methods,HOG has many advantages. First, because the HOG is operated on the local cell of the image, it maintains a very good invariance of both the geometric and optical deformations of the image, both of which can only be found in the larger space field today. Secondly, in the case of coarse airspace sampling, fine directional sampling and strong partial optics normalization, only pedestrians can generally maintain an upright posture, allowing pedestrians to have some subtle limb movements that can be neglected without affecting the results of the test. Therefore, the HOG feature is particularly suitable for the human physical examination in the image.

2, theHOG feature extraction algorithm implementation process:

Approximate process:

HOG Feature extraction method is to put an image(the target you want to detect or scan the form):

1) grayscale (image as a three-dimensional image of x, Y, Z(grayscale));

2) Adopt Gamma correction method to standardize the color space of the input image (normalized); The aim is to adjust the contrast of the image, to reduce the effect of local shadow and illumination changes, and to suppress noise disturbance at the same time;

3) calculate the gradient (including size and direction) of each pixel of the image, mainly to capture contour information, and to further weaken the illumination interference at the same time.

4) Divide the image into small cells(e.g. 6*6 pixels /cell);

5) Statistics of each cell gradient histogram (the number of different gradients), you can form each cell descriptor;

6) make each cell a block(for example, 3*3 cell/block), a block in all The HOG feature of the block is descriptorby the cell 's characteristic descriptor in series.

7) The image can be obtained by concatenating all the block 's HOG feature descriptor The HOG characteristics of (the target you want to test) are descriptor . This is the last feature vector that can be used for classification.

Specific steps for each step such as the following:

(1) standardize gamma space and color space

in order to reduce the influence of illumination factors, the whole image must be normalized (normalized). In the texture intensity of the image, the proportion of the local surface exposure contribution is larger, so the compression processing can effectively reduce the shadow and illumination changes of the image. Because the color information does not function very much, it is usually converted to gray scale.

Gamma Compression formula:

For example, to take gamma=1/2;

(2) Calculate image gradient

calculates the gradient of the horizontal and vertical direction of the image, and calculates the gradient direction value of each pixel position accordingly; The derivative operation can not only capture contour, silhouette and some texture information, but also weaken the influence of illumination.

The gradient of the pixel point (x, y) in the image is:

The most frequently used method is to use the [ -1,0,1] gradient operator to perform convolution operations on the original image, get the gradient component of the x -direction (horizontal direction, right direction) Gradscalx, and then use [1,0,- The 1]t gradient operator makes a convolution operation on the original image, and obtains the gradient component gradscalyin the y direction (vertical direction and positive direction). Then use the above formula to calculate the gradient size and direction of the pixel point.

(3) to construct a gradient histogram for each cell unit

The goal of the third step is to provide an encoding for the local image area, at the same time to maintain a weak sensitivity to the posture and appearance of the body object in the image.

We divide the image into several cellsCell", like everyCellFor6*6A pixel. If we use9AbinThe histogram to count this6*6The gradient information for each pixel. That isCellThe gradient direction theDegree into9A direction block, see: For example, if the gradient direction of this pixel is20-40degree, histogram section2AbinThe count is added one, so that theCellThe weighted projection of each pixel in the histogram (mapped to a fixed angle range) in the gradient direction can beCellHistogram of the gradient direction, which is theCellThe corresponding9Dimension feature vectors (due to the9Abin)。

The pixel gradient direction is used, so what is the gradient size? The gradient size is the weighted value of the projection. For example: This pixel gradient direction is 20-40 degree, then its gradient size is 2(if AH), then the histogram of the 2 bin count is not added one, but plus two (if AH).

Cell cells can be rectangular (rectangular) or star-shaped (radial).

(4) combining cell units into large blocks (blocks), normalized gradient histogram in a block

Because of the change of local illumination and the change of foreground - background control, the variation of gradient intensity is very wide. This requires a normalization of the gradient intensity. Normalization can further compress the illumination, shadows, and edges.

The authors take the approach of combining each cell unit into a large, spatially connected interval (blocks). Thus, the HOG feature of the block is obtained by concatenating the eigenvectors of all cells in a block . These intervals overlap, which means that the characteristics of each cell are repeated with different results in the final eigenvectors. We will refer to the block descriptive descriptor (vector) after normalization as the HOG descriptive narrative.

The interval has two basic geometrical shapes-rectangular interval (r-hog) and annular interval (c-hog). The R-hog interval is largely a square lattice, which can be characterized by three parameters: the number of cell units in each interval, the number of pixels in each cell, and the number of histogram channels per cell.

For example: the best parameter setting for pedestrian detection is:3x3 cells / interval,6x6 pixels / cell,9 histogram channels. The characteristic number of a piece is:3*3*9;

(5) Collection of HOG features

The final step is to collect all the overlapping blocks in the test form and combine them into the HOG feature vectors to be used for classification.

(6) What is the HOG feature dimension of an image?

By the way, make a summary:DalalProposed byHogThe process of feature extraction: cutting a sample image into units of several pixels (Cell), dividing the gradient direction evenly into9Intervals (bin), in each cell to face the gradient direction of all the pixels in the direction of the histogram statistics, to get a9The characteristic vectors of the dimension, each adjacent4Unit to form a block (Block), a feature vector in a block is combined to get $Dimension of the feature vector, the sample image is scanned with a block, the scanning step is a unit. Finally, the features of the whole block are concatenated together, and the characteristics of the human body are obtained. For example, for64*128The image,each 8*8 pixel consists of a cell Every2*2ACellMake up a block, because everyCellYes9Features, so each block has a4*9=36Features to8Pixel is the step size, then the horizontal direction will have a7Scan form, the vertical direction will have a theA scanned form. Other words64*128Picture, always co-owned36*7*15=3780A feature.

Hog dimension, 16x16 pixel block,8x8 pixel cell

Gaze:

Pedestrian detection HOG+SVM

Overall idea:
1. Extracting hog characteristics of positive and negative samples
2, input SVM classifier training, get model
3. The model generates a detector
4, using the detection of negative samples detected, get Hardexample
5, extract the hog characteristics of hardexample and combine the characteristics of the first step to put into training, get finally detected.

In-depth study of hog algorithm principles:
I. Overview of HOG

Histograms of oriented gradients, as the name implies, a directional gradient histogram, is a descriptive narrative of the goal of the way, is not only descriptive narrative. Second, Hog put forward
Hog was presented by a doctor in NB in 05, with links to papersHttp://wenku.baidu.com/view/676f2351f01dc281e53af0b2. HTML three, algorithm understanding
Eventually, by October, it would be a relief to sort out the hog algorithm flow. First of all, there is a general understanding, each target is corresponding to a one-dimensional eigenvector, the vector of a common n-dimensional, this n is not a blind guess, is a well-reasoned,For example, why is the OPENCV self-hog detector a 3781-dimensional one? This problem is really more headache at the beginning, tangled up a long time, just don't worry,Let's take a look at the hogdescriptor in OpenCV. The constructor of this structure Hogdescriptor (SizeWinsize,size blocksize,size blockstride,size cellsize,... (The following are not used here), to check the OPENCV default parameters we can see, winsize (64,128), BlockSize (16,16), Blockstride (8,8), cellsize (8,8), It is very obvious that hog is a feature form win divided into a very many block blocks, in each block is divided into a very many cell cell cells (i.e., cell), the hog eigenvector is not only these all the cell corresponding small feature string up to get a high-dimensional eigenvector, Then the corresponding one-dimensional eigenvector dimension n of this form is equal to the number of cells in the number of blocks in the form x blocksX the number of eigenvectors corresponding to each cell. Write it down here,let's figure out how 3781 gets it., form size 64x128, block size 16x16, block step 8x8,so the number of blocks in the form is ((64-16)/8+1) * ((128-16)/8+1) = 7*15 =105 blocks , the block size is 16x16, the cell size is 8x8, then the number of cell cells in a block is (16/8) * (16/8) =4 Cells, here we can see the required last dimension n, only need to calculate the corresponding vector of each cell, where is this number? Don't worry, we're projecting each cell into 9 bins (how to project?). It's stuck here for a very long time, and it's going to be said later, so the corresponding vector for each cell is9-D, each bin corresponding to the 9-dimensional vector of a number, now see if it is calculated form the number of dimensions of the three requirements are aware, n =Number of blocks in the form x number of cells in a block x corresponding eigenvectors per cell, take a look at n= 105x4x9 = 3780, which is the corresponding feature of this form. Some people will say, why OpenCV in thegetdefaultpeopledetector () to get 3781 dimensions? This is because another dimension is a one-dimensional offset, (very crash is it, I also collapsed very long ...) , the next paragraph explains). we use HOG+SVM to detect pedestrians, and finally the detection method is the most important linear discriminant function, WX + b = 0, just the 3780-dimensional vector is actually w, and added a one-dimensional B to form the OPENCV default 3781-dimensional detection operator, And the test is divided into train and test two parts, during train we need to extract some of the hog characteristics of training samples using SVM training finally is to get our detection of W and B, in the test period to extract the target of the Hog feature x, the equation is not to be able to discriminate it? **************************************************************************************************                   <   /c16> Gorgeous cutting line write here, at least for the hog of the operation process has a general understanding of the online can see a lot of hog calculation method, God horse normalized, calculate gradient, for each cell of the projection, the same, for people who just started contact, see perfect like understand, but just don't know how to use, Hog and SVM How to cooperate, and those things to our early semester completely useless, the advantage is will use hog, and then go back to see the principle, only harvest, those information online a bunch, here do not lily. In addition, it is worth mentioning that in the calculation of cell characteristics, it is necessary to the various bin projection, the projection inside great article, brother graduation thesis mentioned, named ' three-dimensional linear interpolation ', the false assumption in-depth understanding of hog can be carefully pondered. **************************************************************************************************                                      continue the gorgeous cut The following talk about the use of LIBSVM and CVSVM, I think libsvm better use, but CVSVM is based on libsvm2.6 (remember the words) rewrite, the difference is LIBSVM training to get a model, and CVSVM is an XML file, In the calculation of the last wx+b=0 in the W vector, for LIBSVM directly processing the model file can be, but for CVSVM can skip the production of XML files, directly using the properties of the CVSVM object can be (here is a bit vague, the two choose one can, The relationship is not very big)You are welcome to criticize, exchange and study

Hog characteristics of Image feature extraction from target detection

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.