Image feature Extraction: A description of key steps of SIFT location algorithm

Last Update:2014-10-20 Source: Internet

Author: User

Tags pow scale image

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Description of some symbols in the SIFT algorithm

$I (x, y) $ represents the original image.

The $G (X,y,\sigma) $ represents the Gaussian filter, where $g (x,y,\sigma) = \frac{1}{2\pi\sigma^2}exp (-(x^2+y^2)/2\sigma^2) $.

$L (X,y,\sigma) $ represents an image generated by a Gaussian filter with the original image convolution, i.e. (x,y,\sigma) = G (X,y,\sigma) \otimes I (x, y) $. A series of $\sigma_i$, you can generate a series of "(X,y,\sigma_i) $ image, at this time we put this series of" (X,y,\sigma) $ image as a scale space representation of the original image. The knowledge of scale space can be referred to: Image feature extraction: scale space theory.

$DOG $ represents a Gaussian differential (difference of Gaussians) and can also be represented as $d (X,y,\sigma) $, where $d (x,y,\sigma) = (G (x,y,k\sigma) –g (x,y,\sigma)) \ Otimes I (x, y) = L (X,y,k\sigma) –l (X,y,\sigma) $.

It is particularly noteworthy that the Gaussian differential image with a scale of $\sigma$ is generated from an L image with a scale of $k\sigma$ and a scale of $\sigma$. $k $ is a constant of two contiguous scale space multiples.

$O $: The number of groups of Gaussian pyramids (Octave), which is noteworthy in the actual build, the first set of indexes can be 0 or 1, which explains the principle behind.

$S $: The number of layers in each group of Gaussian pyramids. At the beginning of the actual construction of the scale space image, that is, the L image, the construction of the S+3 layer, it must be s+3 and the S-zone, why is the s+3 behind the analysis.

2. Construction of Gaussian difference Pyramid 2.1 first set of first-level image generation

Many of the first sift will be confused by this problem, there are two situations: one is to set the first set of indexes to 0, and the other is to set the first set of indexes as-1.

We first consider the first set of index 0, we know that the first group of first-level image is the original image and $\sigma_o$ (generally set to 1.6) of the Gaussian filter convolution generated, then who is the original image? is $i (x, y) $? No! For image anti-aliasing, it is generally assumed that the input image is Gaussian smoothed, with a value of $\sigma_n = 0.5$, which is half a cell. This means that the image we collected $i (x, y) $, has been smoothed by $\sigma = \sigma_n = 0.5$ Gaussian filter. So we can not directly $i (x, y) $ with $\sigma_0$ Gaussian filter smoothing, but should use $\sigma = \sqrt{\sigma_0^2-\sigma_n^2}$ Gaussian filter to smooth $i (x, y) $, namely

$ $FirstLayer (x, y) = I (x, y) \otimes G (x,y,\sqrt{\sigma_0^2-\sigma_n^2}) $$

where $firstlayer (x, y) $ represents the entire scale space for the 1th set of 1th-level images, $\sigma_o$ generally takes 1.6,$\sigma_n = 0.5$.

Now let's consider setting the index of the first set to 1. The first question then is why the index is set to-1. If the index is 0, as shown in the above example, the entire scale space of the 1th group of the 1th image has been created by the original image, so that the details have been lost, then the original image we completely did not use. Based on this consideration, we first enlarge the image twice times so that the details of the original image are hidden in it. From the above situation analysis, we have learned that I (x, y) as has been $\sigma_n = 0.5$ blurred image, then the $i (x, y) $ magnified twice times after the $i_s (x, y) $, you can see is $2\sigma_n= 1$ the Gaussian kernel blurred image. Then a Gaussian filter $\sigma = \sqrt{\sigma_0^2-(2 \sigma_n) ^2}$ is used by $i_s$ to generate the image of the 1th group 1th layer. can be expressed as.

$ $FirstLayer (x, y) = i_s (x, y) \otimes G (x,y,\sqrt{\sigma_0^2-(2\sigma_n) ^2}) $$

where $firstlayer (x, y) $ represents the entire scale space for the 1th set of 1th layers of the image, $I _s (x, y) $ is the image magnified by $i (x, y) $ with bilinear interpolation. $\sigma_o$ generally take 1.6,$\sigma_n = 0.5$.

How many images are generated in 2.2-scale space

We know that s is the Gaussian differential image we finally built to look for feature points, while the search for feature points needs to find the spatial local minima, that is, when finding local extremum points on a certain layer, we need to use the Gaussian differential image of the previous layer and the next layer, so if we need to find the feature point of S-layer, we need s+ 2-Layer Gaussian differential image, then find the 2nd layer to the S+1 layer.

While each Gaussian differential image $g (X,Y,\SIGMA) $ requires two images of scale space "(X,Y,K\SIGMA) $ with" (X,Y,\SIGMA) $ for differential generation, where S = 3 is assumed, then we need a Gaussian differential image with s+2 = 5 sheets, respectively $g ( X,y,\sigma), G (X,y,k\sigma), G (X,y,k^2\sigma), G (X,y,k^3\sigma), G (X,y,k^4\sigma) $. One of the $g (X,y,k\sigma), G (X,y,k^2\sigma), G (X,y,k^3\sigma) $ these three images are images we use to find local extreme points. Then we need to S+3 = 6 scale space images to generate the above Gaussian differential images, respectively: $L (X,y,\sigma), L (X,y,k\sigma), L (X,y,k^2\sigma), L (X,y,k^3\sigma), L (x,y,k ^4\sigma), L (X,Y,K^5\SIGMA) $

From the above analysis, we know that for the scale space, we need s+3 layer image to build out s+2 layer Gaussian differential image. Therefore, if the entire scale space has an O-group, each group has a s+3 layer image. Common o* (s+3) image, if we look for sift source code in OpenCV, it is easy to find the following codes to illustrate the problem:

Pyr . Resize (noctaves* (Noctavelayers + 3));

The Pyr in the above code represents an image of the entire scale space, noctaves is the number of groups, and Noctavelayers is the s that we define.

2.3 Why is it the 3rd one?

I believe you're looking at a lot of the SIFT algorithm description, which reads, takes the last image of the bottom 3rd image interlaced as the next set of first images.

The answer is to ensure the continuity of the scale space, we will carefully analyze below.

We know that for the scale space of group O, the first layer of the image, its scale is $\sigma = \sigma_o k^{o+s/s}, where K =1/2,o\in[0,1,2,\dots,noctave-1],s\in[0,1,2,\dots,s+2] $ So let's start with group No. 0 and look at the dimensions of each layer.

Group No. 0: $\sigma_o \to 2^{1/3}\sigma_0\to 2^{2/3}\sigma_0\to 2^{3/3}\sigma_0\to 2^{4/3}\sigma_0\to 2^{5/3}\sigma_0$

Group 1th: $2\sigma_o\to 2*2^{1/3}\sigma_0\to 2*2^{2/3}\sigma_0\to 2*2^{3/3}\sigma_0\to 2*2^{4/3}\sigma_0\to 2*2^{5/3}\ sigma_0$

We only analyzed 2 groups can see that the No. 0 image of group 1th coincides with the third image of the No. 0 Group, the scale is $2\sigma_0$, so we do not need to re-convolution according to the original image of each group of No. 0 images, just use the last layer of the reciprocal 3rd to reduce the sample.

We can also continue the analysis, the scale of the No. 0 set of scale space obtained by the Gaussian differential image is: $\sigma_o\to 2^{1/3}\sigma_0\to 2^{2/3}\sigma_0\to 2^{3/3}\sigma_0\to 2^{4/3}\sigma _0$

The size of the Gaussian difference image obtained by the 1th set of scale space is: $2\sigma_o\to 2*2^{1/3}\sigma_0\to 2*2^{2/3}\sigma_0\to 2*2^{3/3}\sigma_0\to 2*2^{4/3}\sigma _0$

If we take out the middle three items together, the scale is: $2^{1/3}\sigma_0\to 2^{2/3}\sigma_0\to 2^{3/3}\sigma_0\to 2*2^{1/3}\sigma_0\to 2*2^{2/3}\ Sigma_0\to 2*2^{3/3}\sigma_0$, just continuous!! The direct benefit of this effect is that in the determination of the extreme point of the scale space, we do not miss the extremum point on any one scale, but we can consider the quantization scale factor synthetically.

2.4 Creating an image of layer I using the image of Layer i-1

It is noteworthy that in the source code of SITF, the image of each layer in the scale space (except the 1th layer) is generated by the image of its front layer and a Gaussian filter convolution of a relative $sigma$, rather than by the original and corresponding scale Gaussian filter. This is because I mentioned earlier that there is no so-called "original", our input image $i (x, y) $ has been scaled to $\sigma = 0.5$ image. On the other hand, if the original is calculated, then the difference between the two adjacent layers is actually very small, which will result in the Gaussian differential image, the majority of the value is nearly 0, so that we can hardly detect the feature points in the back.

Based on the above two reasons (individuals think that cause 1 is the most important, cause 2 is based on a conjecture after the actual attempt, and there is no theoretical basis), so for each group of the $i+1$ layer of the image, is composed of the $i$ layer of image and a relative scale Gaussian filter convolution generated.

So how does the relative scale calculate, we first consider the No. 0 group, the relative scale between the $i+1$ layer image and the $i$ layer image is $sigmadiff_i = \sqrt{(\sigma_0k^{i+1}) ^2– (\sigma_0 k^i) ^2}$, In order to maintain the continuity of the scale, each subsequent group uses the same relative scale (sift in the actual code). There is a speculation, for example, that the scale for the $2\sigma_0$ group, the relative scale between the $i$ layer and the $i+1$ layer should be calculated as $\sqrt{(2\sigma_0k^{i+1}) ^2– (2\sigma_0 k^i) ^2} = 2\cdot sigmadiff_i$, but the code still uses $sigmadiff_i$ because this layer has been reduced dimension.

Sig[0] = Sigma;  k = Pow (2., 1./noctavelayers);  (i = 1; i < noctavelayers + 3; i++) {    sig_prev = Pow (k, (
   
    double
    ) (i-1)) *sigma;     
     
    sig_total = sig_prev*k;    Sig[i] = std::sqrt (Sig_total*sig_total-sig_prev*sig_prev);}

3. Search for feature points 3.1 search strategy

The search for spots is accomplished by comparing the adjacent layers of each dog in the same group. In order to find the extreme point of the scale space, each sample point is compared with all its neighboring points to see if it is larger or smaller than the neighboring point of its image domain and scale domain. For any one of the detection points, and it is the same scale of 8 adjacent points and the upper and lower adjacent scale corresponding to the $9\times2$ point of a total of 26 points compared to ensure that both the scale space and the two-dimensional image location space to detect extreme points. That is, the comparison is carried out within a $3\times3$ cube.

The search process starts at the second level of each group, taking the second layer as the current layer, and a $3\times3$ cube for each point in the second layer of the dog image, with the first and third layers on the upper layer of the cube. In this way, the extremum points of the search are both positional coordinates (dog's image coordinates) and spatial scale coordinates (layer coordinates). When the second-level search is complete, the third layer is used as the current layer, and the process is similar to the second-level search. When s=3, 3 layers are searched for each group.

3.2 Sub-cell interpolation

The search for the extremum point on the above is carried out in the discrete space, and the detected extremum point is not the extreme point in the real sense. The difference between the extreme points of a one-dimensional signal and the extreme points of continuous space is shown in the discrete space. A method called sub-cell interpolation is used to interpolate the value of a known discrete space point to a continuous space extreme point.

First, let's look at an example of one-dimensional function interpolation. We have $f (x) $ above the function value of several points $f ( -1) = 1,f (0) = 6,f (1) = 5$, seeking the maximum value of $f (x) $ on $[-1,1]$.

If we only consider the discrete situation, then we can only use a simple comparison, we know the maximum value is $f (0) = 6$, below we use sub-cell interpolation method to consider the upper case of continuous interval:

With the Taylor series, you can expand the $f (x) $ in the vicinity of $f (0) $ to:

$ $f (x) \approx F (0) + F ' (0) X+\frac{f "(0)}{2}x^2$$

In addition we know that the derivative of $f (x) $ at $x$ is written in discrete form for $f ' (x) = \frac{f (x+1) –f (x)}{2}$, the second derivative is written in discrete form for $f ' (x) = f (x+1) +f (x-1) -2f (x) $.

So, we can figure out $f (x) \approx 6+2x+\frac{-6}{2}x^2 = 6+2x-3x^2$

Find the position of the maximum and maximum values of the function $f (x) $:

$ $f ' (x) = 2-6x = 0, \ \ \ \hat{x} = \frac{1}{3}$$

$ $f (\hat{x}) = 6+2\times \frac{1}{3}–3\times (\frac{1}{3}) ^2 = 6\frac{1}{3}$$

Now back to our sift point detection, we have to consider a three-dimensional problem, suppose we in the scale of $\sigma$ scale image $d (x, y) $ on the detection of a local extremum point, the space position of $ (x,y,\sigma) $, by the above analysis we know that It is just a discrete case of extreme points, in which the extremum point may fall in the vicinity of $ (x,y,\sigma) $, set it to deviate $ (x,y,\sigma) $ in coordinates of $ (\delta X,\delta Y,\delta \sigma) $. Then the $d (\delta x,\delta y,\delta\sigma) $ can be represented as the Taylor expansion at point $ (x,y,\sigma) $:

$ $D (\delta x,\delta y,\delta \sigma) = D (X,y,\sigma) +\begin{bmatrix}
\frac{\partial d}{x} & \frac{\partial D}{y} & \frac{\partial D}{\sigma}
\end{bmatrix}\begin{bmatrix}
\delta x\\
\delta y\\
\delta \sigma
\end{bmatrix}+\frac{1}{2}\begin{bmatrix}
\delta x &\delta Y & \delta \sigma
\end{bmatrix}\begin{bmatrix}
\frac{\partial ^2d}{\partial x^2} & \frac{\partial ^2d}{\partial x\partial y} &\frac{\partial ^2D}{\partial x\ Partial \sigma} \ \
\frac{\partial ^2d}{\partial y\partial x}& \frac{\partial ^2d}{\partial y^2} & \frac{\partial ^2D}{\partial y\ Partial \sigma}\\
\frac{\partial ^2d}{\partial \sigma\partial x}&\frac{\partial ^2d}{\partial \sigma\partial y} & \frac{\partial ^2d}{\partial \sigma^2}
\end{bmatrix}\begin{bmatrix}
\delta x\\
\delta y\\
\delta \sigma
\end{bmatrix}$$

The above can be written in vector form as follows:

$ $D (x) = d+\frac{\partial d^t}{\partial x}\delta x+\frac{1}{2}\delta x^t\frac{\partial ^2d^t}{\partial x^2}\Delta x$$

The first derivative of the order is equal to 0 and can be obtained $\delta x =-\frac{\partial^2d^{-1}}{\partial x^2}\frac{\partial D (x)}{\partial x}$

Through multiple iterations (up to 5 iterations in the Lowe algorithm), the exact position and scale of the final candidate points are obtained, and the $\hat{x}$ of the $d (\hat{x}) $ in the formula is obtained, which is absolutely worth $| D (\hat{x}) |$. If its absolute value falls below the threshold, it will be deleted.

vec3fDD ((img.at<SIFT_WT> (R, C + 1)-img.at<SIFT_WT> (r, c-1)) *deriv_scale, (img.at<SIFT_WT> (r + 1, c)-img.at<SIFT_WT> (r-1, C)) *deriv_scale, (next.at<SIFT_WT> (R, C)-prev.at<SIFT_WT> (R, c)) *deriv_scale);//DD for First order differential vector df/dxfloatv2 = (float) img.at<SIFT_WT> (R, c) * 2;floatdxx = (img.at<SIFT_WT> (R, C + 1) + img.at<SIFT_WT> (R, c-1)-v2) *second_deriv_scale;floatdyy = (img.at<SIFT_WT> (r + 1, c) + img.at<SIFT_WT> (r-1, C)-v2) *second_deriv_scale;floatDSS = (next.at<SIFT_WT> (R, C) + prev.at<SIFT_WT> (R, C)-v2) *second_deriv_scale;floatDXY = (img.at<SIFT_WT> (r + 1, C + 1)-img.at<SIFT_WT> (r + 1, c-1)-img.at<SIFT_WT> (r-1, C + 1) + img.at<SIFT_WT> (r-1, c-1)) *cross_deriv_scale;floatDXS = (next.at<SIFT_WT> (R, C + 1)-next.at<SIFT_WT> (R, c-1)-prev.at<SIFT_WT> (R, C + 1) + prev.at<SIFT_WT> (r, c-1)) *cross_deriv_scale;floatDys = (next.at<SIFT_WT> (r + 1, c)-next.at<SIFT_WT> (r-1, C)-prev.at<SIFT_WT> (r + 1, c) + prev.at<SIFT_WT> (r-1, c)) *cross_deriv_scale;matx33fH (dxx, DXY, DXS, DXY, Dyy, Dys, DXS, Dys, DSS);//DD + Hx = 0--x = h^-1 * (-DD)vec3fX = H.solve (DD,Decomp_lu);

3.3 Removing edge effects

In order to get a stable feature point, it is not enough to simply delete the low point of the dog response value. Because dog has a relatively strong response value to the edges in the image, these points are unstable points once the feature points fall on the edges of the image. On the one hand, the point on the edge of image is difficult to locate, it has localization ambiguity, on the other hand, the point is easily disturbed by noise and become unstable.

A flat dog response peak tends to have a larger principal curvature across the edge, while a smaller principal curvature in the direction of the vertical edge. And the principal curvature can be obtained by $2\times2$ the Hessian matrix $h$:

$ $H (x, y) = \begin{bmatrix}d_{xx} (x, y) & D_{xy} (x, y) \ D_{xy} (x, y) &d_{yy} (x, y) \end{bmatrix}$$

The $D $ value can be obtained by finding the difference between the pixels of a neighboring point. The eigenvalues of $H $ are proportional to the principal curvature of the $d$. We can avoid finding specific eigenvalues, because we only care about the proportions of the eigenvalues. so that $\alpha = \lambda_{max}$ is the largest eigenvalue, $\beta = \lambda_{min}$ is the smallest eigenvalue, then we calculate their sum by the $h$ matrix Trace and calculate their product by the determinant of the $h$ matrix:

$ $Tr (H) = d_{xx}+d_{yy} = \alpha +\beta$$

$ $Det (H) = d_{xx}d_{yy}-(D_{yy}) ^2=\alpha \beta$$

If the $\gamma$ is the ratio between the maximum eigenvalue and the minimum eigenvalue, then $\alpha = \gamma \beta$, so there is

$$\frac{tr (h) ^2}{det (h)} = \frac{(\alpha+\beta) ^2}{\alpha\beta} = \frac{(\gamma+1) ^2}{\gamma}$$

The result of the above formula is only related to the ratio of two eigenvalues, regardless of the specific eigenvalues. When two eigenvalues are equal, the value of the $\frac{(\gamma+1) ^2}{\gamma}$ is minimized, and the $\frac{(\gamma+1) ^2}{\gamma}$ value increases as the $\gamma$ increases. So to check if the ratio of principal curvature is less than a certain threshold $\gamma$, just check if the following formula is true:

$$\frac{tr (h) ^2}{det (h)} < \frac{(\gamma+1) ^2}{\gamma}$$

Lowe in the paper $\gamma = 10$. That is, a feature point with a master curvature ratio greater than 10 is deleted.

floatt = Dd.dot (matx31f(XC, XR, xi));//d (\bar{x}) = D + 1/2*dd*\bar{x}contr = img.at<SIFT_WT> (R, c) *img_scale + t * 0.5f;//Interpolate value of the extremum point worth toif(Std::abs (contr) * Noctavelayers < Contrastthreshold)return False;//principal curvatures is computed using the trace and Det of Hessianfloatv2 = img.at<SIFT_WT> (R, c) *2.f;floatdxx = (img.at<SIFT_WT> (R, C + 1) + img.at<SIFT_WT> (R, c-1)-v2) *second_deriv_scale;floatdyy = (img.at<SIFT_WT> (r + 1, c) + img.at<SIFT_WT> (r-1, C)-v2) *second_deriv_scale;floatDXY = (img.at<SIFT_WT> (r + 1, C + 1)-img.at<SIFT_WT> (r + 1, c-1)-img.at<SIFT_WT> (r-1, C + 1) + img.at<SIFT_WT> (r-1, c-1)) * Cross_deriv_scale;floattr = dxx + dyy;floatdet = dxx * DYY-DXY * DXY;off(det <= 0 | | Tr*tr*edgethreshold >= (edgethreshold + 1) * (Edgethreshold + 1) *det)return False;

Image feature Extraction: A description of key steps of SIFT location algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More