"Sift principle and Source Analysis" series Article index:http://www.cnblogs.com/tianyalu/p/5467813.html
Scale space theory the object in nature with observation
scaleThere are different forms of expression. For example, we describe buildings with "meters", observing molecules, atoms, etc. using "nano". More image examples such as Google Maps, sliding the mouse wheel can change the scale of the observation map, see the map drawing is also different, as well as the film stretching lens and so on ... In the scale space, the fuzzy degree of each scale image becomes larger, which can simulate the formation of the target on the retina from near to distant target.
The larger the scale, the more blurred the image. Why should we discuss scale space? When an unknown scene is analyzed using a machine vision system, the computer does not know the scale of the object in advance. We need to consider the description of the image at the multi-scale, and learn the object of interest
Best scale。 In addition, if different scales have the same key points, then at different scales of the input image can be detected in the key point matching, that is,
scale invariance。
The scale space representation of an image is the description of the image at all scales.
Scale space representation and pyramid multiresolution representation Gaussian Blur
The Gaussian nucleus is the only nucleus that can produce multi-scale space ("Scale-space theory:a basic tool for analysing structures at different scales"). An image of scale space L (x,y,σ), defined as the original image I (x, y) and a variable scale 2 Gaussian function g (x,y,σ) convolution operation.
Two-dimensional spatial Gaussian function:
Scale space:
The scale is the natural existence, not the subjective creation. Gaussian convolution is only a form of the expression scale space.
The two-dimensional Gaussian function is a concentric circle in which the contours are positively distributed from the center:
A convolution array with a non-zero distribution is transformed from the original image, that is, each pixel value is the Gaussian average of the surrounding neighboring pixel values. A 5*5 Gaussian template is shown below:
The Gaussian template is circular symmetric , and the result of the convolution results in the largest weight of the original pixel value, and the farther away the center of the neighboring pixel weight is smaller.
In practical applications, when calculating the discrete approximation of a Gaussian function, pixels outside the approximate 3σ distance can be considered as ineffective, and the calculation of these pixels can be ignored. Therefore, it is common for the program to calculate only (6σ+1) * (6σ+1) to ensure that the relevant pixels are affected.
Gaussian blur another very powerful property is linear: Gaussian Blur using a two-dimensional matrix transformation can be obtained by adding a Gaussian matrix transformation in both horizontal and vertical directions.
The O (n^2*m*n) multiplication is reduced to O (n*m*n) +o (n*m*n) multiplication. (n is the Gaussian kernel size, m,n is a two-dimensional image height and width)
In fact, this part of Gauss just need a simple understanding of it, in OpenCV also only needs a code:
Gaussianblur (dbl, Dbl, Size (), Sig_diff, Sig_diff);
I wrote about it here because this piece is useful for the efficiency of the analysis algorithm, and the Gaussian blur algorithm is really beautiful ~
Pyramid Multi-resolution
Pyramids are multi-scale representations of early images. Image pyramid typically consists of two steps: smoothing the image with a low-pass filter, and reducing the smoothing image (usually horizontal, vertical 1/2), resulting in a series of reduced-size images.
Medium (a) is a low-pass filter of the original signal, (b) is a reduced-sampled signal.
For two-dimensional images, in a traditional pyramid, each layer of the image consists of one half of the length and width of the previous layer resolution, which is one-fourth pixels:
The biggest differences between multiscale and multiresolution scale spatial representations and pyramid multiresolution representations are:
- The scale space expression is obtained by the different Gaussian kernel smoothing convolution and has the same resolution at all scales;
- The pyramid multiresolution representation reduces the fixed ratio per layer resolution.
Therefore, the pyramid multi-resolution generation is faster, and occupies less storage space, while Multiscale expression increases with the scaling parameters of redundant information is also more. The advantage of multi-scale expression is that the local feature of image can be described on different scales in simple form, but the pyramid expression has no theoretical basis and it is difficult to analyze the local feature of the image. DoG (difference of Gaussian) Laplace log pyramid combines scale space representation and pyramid multiresolution representation, which is the use of pyramid representations when using scale space, the most famous Laplace Gold tower in computer vision (the Laplacian Pyramid as a compact image code). The Laplace log (Laplace of Guassian) operator is the Laplace transformation of the Gaussian function: The core thought or Gauss, this is not much narrative. The Gaussian difference Dog Pyramid dog (difference of Gaussian) is actually the approximation to the Laplace log, which is the approximation of the pair. The SIFT algorithm suggests that the feature detection on a certain scale can be obtained by subtracting the image of two adjacent Gaussian scale space and obtaining the dog's response value image D (x,y,σ). Then, according to the log method, the local maximum value of the response value image D (x,y,σ) is searched, and the local feature points are located in the spatial position and scale space. which
kis a constant that is a multiple of the contiguous two scale space. Medium (a) is a three-dimensional picture of dog, and (b) is the comparison between dog and log. Pyramid build Gaussian pyramid to get the dog image, we first construct the Gaussian pyramid. We'll go back and talk. Gauss Pyramid ~ Gaussian pyramid is simple in multi-resolution pyramid
Gaussian filtering is added on the basis of reduced sampling, that is, the pyramid each layer of images with different parameters of σ do Gaussian blur, so that each layer of pyramid has multiple Gaussian blur image. Multiple images per layer of pyramids are called a group (Octave), and each group has more than one (also called layer interval) image. In addition, the first image of a set of images on the top of the pyramid (the bottom one) is sampled from the last set of images (below the pyramid) in the previous group (a set of pyramids). Here is the code for constructing the Gaussian pyramid in OpenCV, and I added the corresponding comment:
//Constructing noctaves groups (Noctaves+3 layers per group) Gauss PyramidvoidSift::buildgaussianpyramid (Constmat&Base, vector<mat>& Pyr,intnoctaves)Const{vector<Double> sig (Noctavelayers +3); Pyr.resize (noctaves* (Noctavelayers +3)); //precompute Gaussian Sigmas using the following formula://\sigma_{total}^2 = \sigma_{i}^2 + \sigma_{i-1}^2,//calculating the scale factor of Gaussian blur with different scales for the imagesig[0] =Sigma; DoubleK = POW (2.,1. /noctavelayers); for(inti =1; I < Noctavelayers +3; i++ ) { DoubleSig_prev = Pow (k, (Double) (I-1))*Sigma; DoubleSig_total = sig_prev*K; Sig[i]= Std::sqrt (sig_total*sig_total-sig_prev*Sig_prev); } for(into =0; o < noctaves; o++ ) { //Dog Gold Tower needs noctavelayers+2 layer image to detect noctaves layer scale//so Gaussian pyramid needs noctavelayers+3 layer image to get noctavelayers+2 layer of dog pyramid for(inti =0; I < Noctavelayers +3; i++ ) { //DST for Group O (Octave) pyramidsmat& DST = pyr[o* (noctavelayers +3) +i]; //No. 0 Group No. 0 layer is the original image if(O = =0&& i = =0) DST=Base; //base of new octave is halved image from end of previous octave//each group of the No. 0 image is sampled from the last set of the third image spacer. Else if(i = =0 ) { Constmat& src = pyr[(o1) * (Noctavelayers +3) +Noctavelayers]; Resize (src, DST, Size (Src.cols/2, src.rows/2), 0,0, inter_nearest); } //each group of sub-I images is a Gaussian blur obtained from the i-1 sub-image Sig[i]//The image of this group of images in the scale space of Sig[i] Else { Constmat& src = pyr[o* (noctavelayers +3) + I1]; Gaussianblur (SRC, DST, Size (), Sig[i], sig[i]); } } }}
The number of Gaussian pyramid groups is: Code 10-17 line is the calculation of Gaussian blur coefficient σ, the specific relationship is as follows: where σ is the scale space coordinates, s for each group of middle-level coordinates, σ0 for the initial scale, s for each group of layers (usually for the first. According to this formula, we can get the scales of each layer in the Pyramid group and the relation of the image scale between the groups. Scale relationship of neighboring images in group: relationship between neighboring groups: So,
The relationship between two adjacent groups of the same layer scale is twice times。 The final scale sequence is summarized as: O is the number of pyramid groups, and N is the number of pyramid layers per group. After constructing the Gaussian pyramid, the dog Pyramid is constructed by subtracting the pyramid adjacent images. Here is the code that constructs the dog:
//Constructing noctaves Group (noctaves+2 layers per group) Gaussian differential pyramidvoidSift::builddogpyramid (Constvector<mat>& Gpyr, vector<mat>& dogpyr)Const{ intNoctaves = (int) Gpyr.size ()/(Noctavelayers +3); Dogpyr.resize (noctaves* (Noctavelayers +2) ); for(into =0; o < noctaves; o++ ) { for(inti =0; I < Noctavelayers +2; i++ ) { //the second image of group O is subtracted from the image of group i+1 and Group I in the Gaussian pyramid. Constmat& Src1 = gpyr[o* (noctavelayers +3) +i]; Constmat& src2 = gpyr[o* (noctavelayers +3) + i +1]; Mat& dst = dogpyr[o* (noctavelayers +2) +i]; Subtract (SRC2, Src1, DST, Noarray (), cv_16s); } }}
This is relatively simple, is a
Subtract ()Function.
At this point, sift the first step is completed. See thesift Principle and source code analysis
This article transferred from: http://blog.csdn.net/xiaowei_cqu/article/details/8067881
The principle and source analysis of "OpenCV" sift: dog-scale space structure