"Sift principle and source code analysis" series of articles index: http://blog.csdn.net/xiaowei_cqu/article/details/8069548
Dimensional Space Theory: object observation in nature
ScaleDifferent forms have different manifestations. For example, we describe buildings with "meters", and observe molecules and atoms with "nano ". Examples of better image, such as Google Maps, sliding a mouse wheel can change the scale of the map to be observed, and the map to be seen is also different. There are also stretching shots in movies ...... In the scale space, the Blur degree of each scale image gradually increases, which can simulate the formation process of the target on the retina from near to far from the target.
The larger the scale, the blurrier the image.Why do we need to discuss the scale space? When using machine vision to analyze unknown scenes, the computer does not know the scale of objects in the image in advance. We need to consider the description of the image at multiple scales to obtain
Optimal Scale. In addition, if the same key point exists at different scales, key point matching can be detected in input images at different scales, that is
Scale immutability.
Scale Space expression of imagesIt is the description of the image at all scales.
Scale Space expression and pyramid multi-resolution expression
Gaussian blur
Gaussian KernelIs the only core that can generate multi-scale space (scale-Space Theory: a basic tool for analyzing structures at different scales). The scale space L (X, Y, σ) of an image is defined as the original image I (x, y) and a variable scale 2-dimensional Gaussian function g (x, y, σ) convolution.
Two-dimensional Gaussian Functions:
Scale Space:
The scale is natural and objective, not subjective.Gaussian convolution is only a form of representation of the scale space.
The two-dimensional space Gaussian function is the concentric circle from the center of the contour line that is becoming a positive and too distributed:
A convolution array composed of vertices with a non-zero distribution is transformed from the original image, that is, each pixel value is the Gaussian mean of neighboring pixel values. A 5*5 Gaussian template is as follows:
Gaussian templates are round-symmetricAnd the convolution result gives the original pixel value the maximum weight, and the smaller the weight of the adjacent pixel value farther from the center.
In practical applications, when calculating the discrete approximation of Gaussian Functions3σPixels outside the distance can be regarded as ineffective, and the calculation of these pixels can be ignored. Therefore, generally, the program only calculates(6 σ + 1) * (6 σ + 1)This ensures the influence of related pixels.
Another amazing feature of Gaussian Blur isLinear differentiation:Gaussian Blur using two-dimensional matrix transformation can be obtained by adding one-dimensional Gaussian matrix transformation in both the horizontal and vertical directions.
O (N ^ 2 * m * n) multiplication is reduced to O (N * m * n) + O (N * m * n) multiplication. (N is Gaussian Kernel size, m, n is two-dimensional image height and width)
In fact, this part of Gauss only requires a simple understanding. In opencv, you only need a code:
GaussianBlur(dbl, dbl, Size(), sig_diff, sig_diff);
I wrote it in detail here because it is useful for analyzing algorithm efficiency, and the Gaussian fuzzy algorithm is really beautiful ~
Pyramid Multi-Resolution
Pyramid is a multi-scale representation of early images. Image pyramid consists of two steps: using a low-pass filter to smooth the image; downsampling the smooth image (usually horizontal, vertical 1/2) to obtain a seriesZoom out.
Medium (a) is the low-pass filtering of the original signal, and (B) is the signal obtained from downsampling.
For two-dimensional images, in a traditional pyramid, each layer of images is composed of the length, width, and half of the resolution on the previous layer, that is, 1/4 pixels:
The biggest difference between multi-scale and Multi-Resolution Spatial Expressions and pyramid multi-resolution expressions is:
- The expression of scale space is obtained by smooth convolution of different Gaussian kernels and has the same resolution on all scales;
- The pyramid multi-resolution Expression Reduces the fixed ratio of resolution at each layer.
Therefore, the multi-resolution of the pyramid is generated quickly and occupies less storage space. The multi-scale expression also increases redundant information as the scale parameter increases. The advantage of multi-scale expression is that the local features of an image can be described on different scales in a simple form. The pyramid expression has no theoretical basis and it is difficult to analyze the local features of the image. Dog (difference of Gaussian)
The Gaussian Laplace log pyramid combines the expression of the scale space and the Multi-Resolution expression of the pyramid, that is, the pyramid is used to represent the scale space, that is, the Laplacian pyramid as a compact image code. Gaussian Laplace log (Laplace of Guassian) operator is the Laplace transformation of Gaussian Functions: core idea or Gaussian, which is not described here. Gaussian difference dog pyramid dog (difference of Gaussian) is actually an approximation of Gaussian Laplace log, that is, an approximation. The sift algorithm suggests that feature detection at a certain scale can subtract two adjacent Gaussian scale spaces to obtain the dog response value image D (X, Y, σ ). Then, based on the log method, the local maximum search is performed on the response value image d (x, y, σ) to locate the local feature points in the spatial location and scale space. Where:
KIt is a constant that is a multiple of two adjacent scales. (A) Is the 3D map of dog, (B) is the comparison between dog and log. To get the dog image, first construct the Gaussian pyramid. Let's go back and talk about the Gaussian pyramid ~ Gaussian pyramid in Multi-Resolution pyramid is simple
Added Gaussian filter based on downsampling.That is to say, use σ of different parameters to blur each layer of the pyramid, so that each layer of the pyramid has multiple Gaussian blurred images. Multiple images on each layer of the pyramid are collectively referred to as a group (Ave ave), each group has multiple (also called layer interval) images. In addition, when downsampling is performed, the first image (the bottom image) of the first group of images on the pyramid is composed of the previous Group (the lower part of the pyramid) the last and third intervals of the image are sampled. The following is the code for constructing the Gaussian pyramid in opencv. I added the corresponding Annotations:
// Construct the noctaves group (noctaves + Layer 3) Gaussian pyramid void sift: buildgaussianpyramid (const mat & base, vector <mat> & Pyr, int noctaves) const {vector <double> sig (noctavelayers + 3); Pyr. resize (noctaves * (noctavelayers + 3); // precompute Gaussian Sigmas using the following formula: // \ Sigma _ {total} ^ 2 = \ Sigma _ {I} ^ 2 + \ Sigma _ {I-1} ^ 2. // calculate Gaussian blur of the image at different scales scale Factor sig [0] = sigma; double K = POW (2 ., 1. /noctavelayers); For (INT I = 1; I <noctavelayers + 3; I ++) {double sig_prev = POW (K, (double) (I-1) * sigma; double sig_total = sig_prev * k; sig [I] = STD: SQRT (sig_total * sig_total-sig_prev * sig_prev);} For (int o = 0; O <noctaves; O ++) {// dog gold tower requires noctavelayers + 2-layer image to detect noctaves layer scale // Therefore, the Gaussian pyramid requires noctavelayers + 3-layer image to get noctavelayers + 2-layer dog pyramid for (INT I = 0; I <noctavelayers + 3; I ++) {// DST is the O group (octave) Pyramid mat & DST = Pyr [O * (noctavelayers + 3) + I]; // The first layer of The 0th group is the original image if (O = 0 & I = 0) DST = base; // base of New Ave ave is halved image from end of previous octave // else if (I = 0) is obtained by sampling the last and third image points in each group of 0th images.) {const mat & src = Pyr [(O-1) * (noctavelayers + 3) + noctavelayers]; resize (SRC, DST, size (SRC. cols/2, SRC. rows/2), 0, 0, inter_nearest );} // each set of I-secondary images is obtained by Gaussian blur of SIG [I] by the first I-1-secondary image // that is, the image of this group is in the scale space of SIG [I] image else {const mat & src = Pyr [O * (noctavelayers + 3) + I-1]; gaussianblur (SRC, DST, size (), Sig [I], Sig [I]) ;}}}
The number of Gaussian pyramid groups is:
Line 10-17 of the Code calculates the Coefficient σ of Gaussian blur. The relationship is as follows: σ is the coordinate of the scale space, S is the coordinate of the middle layer of each group, and σ 0 is the initial scale, s indicates the number of layers in each group (generally 3 ~ 5 ). Based on this formula, we can obtain the scale of each layer in the Pyramid Group and the scale relationship between each image in the group. Scale relationship of adjacent images in the group: Scale relationship between adjacent groups: So,The scale of the same layer in two adjacent groups is 2X. The final scale sequence is summarized as follows: O is the number of pyramid groups, and N is the number of layers of each pyramid group. After building the dog pyramid and building the Gaussian pyramid, the dog pyramid is constructed by subtraction of adjacent pyramid images.
The following code constructs a dog:
// Construct noctaves group (noctaves + layer 2) Gaussian difference pyramid void sift: builddogpyramid (const vector <mat> & gpyr, vector <mat> & dogpyr) const {int noctaves = (INT) gpyr. size ()/(noctavelayers + 3); dogpyr. resize (noctaves * (noctavelayers + 2); For (int o = 0; O <noctaves; O ++) {for (INT I = 0; I <noctavelayers + 2; I ++) {// The I-secondary image of group O is the image of group I + 1 of group O in the Gaussian pyramid, and the const mat & src1 = gpyr [O * (noctavelayers + 3) + I]; const mat & src2 = gpyr [O * (noctavelayers + 3) + I + 1]; MAT & DST = dogpyr [O * (noctavelayers + 2) + I]; subtract (src2, src1, DST, noarray (), cv_16s );}}}
This is relatively simple.Subtract ()Function.
Now, step 1 of Sift is complete. See sift principle and source code analysis
(Reprinted please indicate the author and Source: http://blog.csdn.net/xiaowei_cqu is not allowed for commercial use)