1, STFT (scale invariant Feature Transform) Introduction 1.1 Sift feature detection step 1.2 sift algorithm features 1.3 Sift algorithm can solve the problem 2, scaling space 2.1 Multi-resolution pyramid 2.2 Gauss Pyramid Build example 2.3 Gaussian scale space (using different parameters) 3, Dog spatial extremum detection (find key) 4, delete bad Extreme points (feature points) 5, find the main direction of feature points 6, generate feature description 7, summary
1, STFT (scale invariant Feature Transform) Introduction
The core problem of matching is to match the image of the same target at different time, different resolution, different illumination and different direction.
The traditional matching algorithm is often the direct extraction of corner or edge, poor adaptability to the environment, need a strong robustness, can adapt to different situations of effective target recognition method.
[1-2] Sift was proposed by David Lowe in 1999 and perfected in 2004. Sift in the characterization of digital images deserved to be called the most red one of the most fire, many people to the SIFT has been improved, the birth of SIFT a series of variants. Sift has applied for a patent. the steps of 1.1 sift feature detection
extremum Detection in scale space: searches for images on all scale spaces, and uses Gaussian differential functions to identify potential points of interest that are invariant to scale and rotation.
feature point positioning: in each candidate position, a fitting fine model is adopted to determine the location scale, and the key points are selected according to their stability degree.
Feature Direction assignment: based on the image local gradient direction, assigned to each key position in one or more directions, all subsequent operations are the key points of the direction, scale and location of the transformation, so as to provide the invariance of these characteristics.
feature point description: in the neighborhood surrounding each feature point, the local gradient of the image is measured at the selected scale, which is transformed into a representation that allows for larger local shape deformations and illumination transformations.
features of the 1.2 Sift algorithm (graph source)
The local characteristics of the image, the rotation, scale scaling, brightness changes remain unchanged, the angle of view changes, affine transformation, noise also maintain a certain degree of stability.
The uniqueness is good, the information is rich, is suitable for the massive characteristic library to carry on the fast, the accurate match.
Multi-volume, even a few objects can produce a large number of SIFT characteristics
High-speed, optimized sift matching algorithm can even achieve real-time performance
It can be conveniently combined with other eigenvectors. 1.3 Sift algorithm can solve the problem
The performance of the image Registration/target recognition tracking is affected by the target's own state, the environment of the scene and the imaging characteristics of the imaging equipment, and the SIFT algorithm can be solved to some extent: target rotation, scaling, translational image affine/projective transformation illumination affects target occlusion scene noise 2, scale space
In a certain range, whether the object is large or small, the human eye can be distinguished, but the computer must have the same ability but difficult, in the unknown scene, computer vision can not provide the size of the object, one of the methods is to the object at different scales of the image are provided to the machine, So that the machine can have a unified understanding of the object at different scales, in the process of establishing a unified cognition, we should consider the characteristics of the image at different scales. 2.1 Multi-resolution pyramids
Early image Multiscale often uses the representation of the image pyramid, which is a set of results from the same image at different resolutions, and the resulting process consists of:
Smoothing the original image
Lower-sample the smoothed image
A series of continuously scaled images are obtained after the sample is reduced. Clearly, in a traditional pyramid, the image of each layer is half the length and height of its previous image. Although the multi-resolution image pyramid is simple to generate, its essence is to reduce the sampling, the local feature of the image is difficult to maintain, that is, the scale invariance of the feature cannot be maintained. 2.2 Gaussian Pyramid build example
the construction of the Gaussian pyramid can be divided into two steps: Gaussian smoothing of the image to perform the next sampling of the smoothed image
In order to make the continuity of scale system, Gaussian filtering is added on the basis of simple sampling, and a pair of images can produce several groups (octave) images, and a set of images includes several layers (interval) images.
Gaussian pyramid distribution (O-group S-layer):
It is easy to see that the Gaussian pyramid has more than one group, each group and multilayer, a group of multiple layers between the scale is not the same, that is, the use of the Gaussian parameter Σσ\sigma different, the scale difference between two adjacent layers of a scale factor K, if each group has S layer, then k=2 1S k = 2 1 S k=2 ^ {\FR AC {1}{s}}, the bottom image of the previous set of images is the next set of images with a scale of 2σ2σ2\sigma to 2 of the reduced sampling (Gaussian pyramid is created from the bottom), after the completion of the Gaussian pyramid, the adjacent pyramid is subtracted from the dog pyramid.
Groups of Gauss pyramids: O=[log 2 min (m,n]−a o = [l o g 2 m i n (M, n]−a o=[log_2min (m,n]-a
O represents the number of layers of the Gaussian pyramid, M,n are the rows and columns of the image, respectively. The minus factor A can be at 0−log 2 min (m,n) 0−log 2 min (m,n) 0−l o g 2 m i n (m, n) 0−l o g 2 m i n (m, N) 0−log_2min (m,n) 0−log_2 Any value between min (M,n), and the size of the top-level image of the pyramid that is specifically needed.
Gaussian blur parameters can be obtained from the following relationship: Σ (o,s) =σ0⋅2 o+ssσ (o, s) =σ0⋅2 O + S \sigma (o,s) =\sigma_0 \cdot 2^{\frac{o+s}{s}}
Where O is the group in which S is located, the σ0 is the initial scale and S is the number of layers per group.
The relationship between the image scales of adjacent layers within the same group: