Write a simple topic: Object Recognition and scene understanding, which includes the following three parts:
1. Object Recognition from local scale-invariant features, a feature-based target recognition algorithm. The most representative is the sift feature of David G. Lowe.
The author of this Part has applied for a patent, so I will not introduce it more here.
2. histograms of Oriented gradients for human detection
Pedestrian detection based on Hog features
3. A discriminatively trained, multiscale, deformable part model
DPM has good target detection algorithms so far
Use Network Resources as much as possible based on the above framework, so that you can gather strength and share this part.
Hog features
Http://blog.csdn.net/carson2005/article/details/7782726
The gradient histogram feature (hog) is a type of intensive descriptor for partial overlapping areas of the image. It forms a feature by calculating the gradient direction histogram of the partial area. Hog feature combined with SVM classifier has been widely used in image recognition, especially in pedestrian detection. It should be noted that the hog + SVM method for pedestrian detection was proposed by French researchers Dalal at cvpr 2005. Although many pedestrian detection algorithms are constantly proposed, however, it is basically based on the concept of hog + SVM.
The hog feature is a local region descriptor. It computes the gradient direction histogram on the local region to form the features of the human body, which can well describe the edge of the human body. It is not sensitive to illumination changes and a small amount of offset.
The gradient of the pixel (x, y) in the image is
The process of hog Feature Extraction proposed by Dalal: The sample image is divided into several cell units (cells), and the gradient direction is evenly divided into nine bins ), histogram statistics are performed on the gradient directions of all pixels in each unit to obtain a nine-dimensional feature vector. Each adjacent four units form a block ), combine the feature vectors in a block to obtain a 36-dimensional feature vector, scan the sample image using the block, and scan the step size as a unit. Finally, all the features of the block are connected together to obtain the features of the human body. For example, for 64*128 images, every 2*2 units (16*16 pixels) constitute a block, each with 4*9 = 36 features, taking 8 pixels as the step size, there will be 7 scanning windows in the horizontal direction and 15 scanning windows in the vertical direction. That is to say, 64*128 of images have a total of 36*7*15 = 3780 features.
In addition to the hog feature extraction process mentioned above, the process also includes steps such as Converting color graphs to grayscale and brightness correction. To sum up, the hog feature calculation steps in pedestrian detection are as follows:
(1) convert the input color image into a grayscale image;
(2) The Gamma Correction method is used to standardize the color space (normalization) of the input image. The purpose is to adjust the contrast of the image and reduce the effect of partial shadow and illumination changes, it can also suppress noise interference;
(3) Calculate the gradient, mainly to capture the contour information and further weaken the interference of illumination.
(4) projects the gradient to the gradient direction of the Unit. The purpose is to provide an encoding for the local image area,
(5) normalize all cells in blocks. normalization further compresses illumination, shadows, and edges. Generally, each cell is shared by multiple different blocks, however, its normalization is based on different blocks, so the calculation results are different. Therefore, the features of a cell appear in the final vector multiple times with different results. We call the normalized block descriptor a hog descriptor.
(6) collect the hog features of all blocks in the detection space. This step collects all overlapping blocks in the Detection Window for hog features, and combine them into the final feature vectors for classification.