Sift algorithm detailed

Last Update:2015-10-06 Source: Internet

Author: User

Tags image filter modulus scale image

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

http://blog.csdn.net/zddblog/article/details/7521424

Directory (?) [-]

Scaling invariant feature transform matching algorithm scale invariant Feature transformsift Just
Zdd zddmailgmailcom or zddhubgmailcom
Sift Overview
Gaussian Blur
12 Gaussian function
Ivigos Blur of 2 images
3 Separating Gaussian Blur
1 Scale space theory
Representation of 2 scale space
Construction of 3 Gauss Pyramid
The use of a Gaussian pyramid in the implementation of a scale space means that the construction of a Gaussian pyramid is divided into two parts
Gaussian blur of different scales for images
4 Gaussian differential pyramid
Initial exploration of key points of 5 space extreme point detection
6 parameters to be determined for building scale space
Key point Positioning
1 precise positioning of key points
2 Eliminating edge response
3 Derivation by finite difference method
43-Order matrix Inverse formula
Key Point Direction Assignment
Key Point Feature description
Disadvantages of Sift
Summarize
Resources
Appendix 1 Gauss Fuzzy Source code
Appendix 2 Sift algorithm source code

A detailed analysis of the matching algorithm of scale invariant feature transform
Scale invariant Feature Transform (SIFT)
Just for Fun

Zdd[email protected]or ([email protected])

For starters, from David G.lowe's thesis to implementation, there are many gaps in this article to help you across.

If you study Sifi to do the search, perhaps Opensse is more suitable for you, welcome to use.

If you want to fight together, welcome to join the Daniel Club!

1. Sift Overview

Scale invariant feature conversion (Scale-invariant feature transform or SIFT) is a computer vision algorithm used to detect and describe the local features in the image, it looks for extreme points in the spatial scale, and extracts its position, scale, rotation invariants, the algorithm is David Lowe was published in 1999 and completed in 2004.

Its applications include object identification, robot map sensing and navigation, image stitching, 3D model building, gesture recognition, image tracking, and motion alignment.

The algorithm has its patents, and the patent holders are the University of British Columbia.

The description and detection of the local image feature can help to identify the object, which is independent of the size and rotation of the image based on the point of interest of some local appearance on the object SIFT. The tolerance for light, noise, and slight change of perspective is also quite high. Based on these features, they are highly significant and relatively easy to capture, and are easily recognizable and rarely mistaken in a feature database with a large number of masters. Using the SIFT feature to describe the detection rate for some object shading is also quite high, even if only 3 or more Sift object features are sufficient to calculate the position and orientation. Under the current computer hardware speed and the small characteristic database condition, the recognition speed can be close to the immediate operation. The information of SIFT features is large, which is suitable for fast and accurate matching in mass database.

The features of the SIFT algorithm are:

1. Sift feature is a local feature of the image, its rotation, scale scaling, brightness changes remain invariant to the perspective change, affine transformation, noise also maintain a certain degree of stability;

2. The uniqueness (distinctiveness) is good, the information is rich, is suitable in the massive characteristic database to carry on the fast, the accurate match;

3. Multi-volume, even if a few objects can also produce a large number of sift eigenvector;

4. High-speed, optimized sift matching algorithm can even achieve real-time requirements;

5. Extensibility, can be easily combined with other forms of eigenvector.

Sift algorithms can solve the problem:

The performance of image Registration/target recognition tracking is affected by factors such as the state of the target, the environment in which the scene is located, and the imaging characteristics of the imaging equipment. And the SIFT algorithm can be solved to some extent:

1. Rotation, scaling, panning (RST) for the target

2. Image affine/projective transformations (viewpoint viewpoint)

3. Lighting effects (illumination)

4. Target occlusion (occlusion)

5. Clutter Scene (clutter)

6. Noise

The essence of SIFT algorithm is to find the key points (feature points) in different scale space, and calculate the direction of the key points. Sift found the key points are some very prominent, will not be due to light, affine transformation and noise factors such as changes in points, such as corner points, edge points, dark spots and bright areas of the dark spots.

Lowe decomposes the SIFT algorithm into the following four steps:

1. Scale Space Extremum detection: Search for image locations on all scales. A Gaussian differential function is adopted to identify potential points of interest that are invariant to scale and rotation.

2. Key point positioning: In each candidate position, the position and scale are determined by a well-fitted model. The key points are chosen based on their degree of stability.

3. Direction determination: Based on the image local gradient direction, assigned to each key point position in one or more directions. All subsequent operations on the image data are transformed relative to the direction, scale, and position of the key points, providing invariance for these transformations.

4. Key point description: In the neighborhood surrounding each key point, the local gradient of the image is measured at the selected scale. These gradients are transformed into representations that allow for larger local shape deformations and illumination variations.

This article follows the steps of Lowe, referring to Rob Hess and Andrea Vedaldi source code, detailed SIFT algorithm implementation process.

2. Gaussian Blur

Sift algorithm is to find the key points in different scale space, while the acquisition of scale space needs to be realized by Gaussian Blur, Lindeberg and others have proved that Gaussian convolution kernel is the only transformation kernel that realizes scale transformation and is the only linear kernel. This section first describes the Gaussian blur algorithm.

2.1 Ivigos function

Gaussian Blur is an image filter that uses a normal distribution (Gaussian function) to calculate a fuzzy template and uses the template to do convolution operations with the original image to achieve the purpose of blurring the image.

The n-dimensional normal distribution equation is:

(1-1)

Where is the standard deviation of the normal distribution, the larger the value, the more blurred (smooth) the image. R is the fuzzy radius, and the blur radius is the distance from the template element to the center of the template. If the two-dimensional template size is m*n, the Gaussian formula for the element (x, y) on the template is:

(1-2)

In two-dimensional space, the contours of the surface generated by this formula are concentric circles that begin with a normal distribution from the center, as shown in 2.1. A convolution matrix that distributes nonzero pixels is transformed from the original image. The value of each pixel is a weighted average of the surrounding neighboring pixel values. The value of the original pixel has the largest Gaussian distribution value, so there is a maximum weight, and the neighboring pixels are getting farther away from the original pixels, and their weights are getting smaller. This blurring process preserves the edge effect more than other equalization fuzzy filters.

Theoretically, the distribution of each point in the image is nonzero, which means that each pixel needs to contain an entire image. In practical applications, when calculating the discrete approximation of a Gaussian function, pixels outside the approximate 3σ distance can be considered as ineffective, and the calculation of these pixels can be ignored. In general, the image processing program only needs the computed matrix to guarantee the relevant pixel influence.

Ivigos Blur of 2.2 images

According to the value of σ, calculate the size of the Gaussian template matrix (), using the formula (1-2) to calculate the value of the Gaussian template matrix, and the original image convolution, you can obtain the original image of the smooth (Gaussian blur) image. To ensure that the elements in the template matrix are between [0,1], the template matrix must be normalized. The Gaussian template for 5*5 is shown in table 2.1.

is a Gaussian template convolution calculation for 5*5. The Gaussian template is center symmetric.

2.3 Separating Gaussian Blur

2.3, the use of two-dimensional Gaussian template to achieve the purpose of the fuzzy image, but due to the relationship between the template matrix caused by the edge image loss (2.3 b,c), the larger the missing pixels, the drop template will cause black edge (2.3 D). More importantly, the Gaussian template (Gaussian core) and convolution operations will be significantly increased when it becomes larger. According to the separation of Gaussian function, the Ivigos fuzzy function can be improved.

The separation of Gaussian functions means that the effects obtained by using a two-dimensional matrix transformation can also be obtained by a Gaussian matrix transformation in horizontal direction and a Gaussian matrix transformation in the vertical direction. From a computational point of view, this is a useful feature, because this only requires a secondary calculation, and the two-dimensional irreducible matrix requires a secondary calculation, wherein the M,N is the dimension of the Gaussian matrix, M,n is the dimension of the two-dimensional image.

In addition, the two-time one-dimensional Gaussian convolution eliminates the edges produced by the Ivigos matrix. (The discussion on the elimination of edges, as shown in 2.4, does not do convolution calculations for parts where the template matrix is out of bounds-the dashed box.) 2.4 In the x direction of the first template 1*5, will degenerate into 1*3 template, only within the image of the portion of the convolution. )

Appendix 1 is a Ivigos fuzzy and discrete Gaussian blur implemented with opencv2.2. Table 2.2 Compares these two methods with the Gaussian Blur program implemented by the opencv2.3 Open Source Library.

3. Scale Space Extremum detection

The scale space is represented by a Gaussian pyramid. Tony Lindeberg points out that the scale-normalized log (laplacion of Gaussian) operator has true scale invariance, and Lowe uses the Gaussian difference pyramid approximation log operator to detect stable key points in scale space.

3.1 Scale space theory

Scale space, the idea was first proposed by Iijima in 1962, after the Witkin and Koenderink and other people's promotion gradually get attention, in the computer Vision neighborhood widely used.

The basic idea of scale space theory is to introduce a parameter which is regarded as scale in the image information processing model, obtain the scale space representation sequence under multi-scale by continuous changing scale parameter, and extract the main contour of the scale space, and take the main contour as a eigenvector to realize the edge and Corner detection and feature extraction at different resolutions.

The scale-space method incorporates the traditional single-scale image information processing technology into the dynamic analysis framework, which makes it easier to obtain the essential features of the image. In the scale space, the fuzzy degree of each scale image becomes larger, which can simulate the formation of the target on the retina from near to distant target.

The scale space satisfies the visual invariance. The visual interpretation of this invariance is as follows: When we look at an object with our eyes, the brightness level and contrast of the retina-aware image are different when the illumination condition of the object's background changes, so it is required that the analysis of the image by the scale-space operator is not affected by the gray level and contrast of the image. That is to meet the gray-scale invariance and contrast invariance. On the other hand, relative to a fixed coordinate system, when the relative position of the observer and the object changes, the position, size, angle and shape of the image that the retina perceives are different, so it is required that the scale space operator is independent of the image's position, size, angle and affine transformation, which satisfies the translation invariance, Scale invariance, Euclidean invariance and affine invariance.

Representation of 3.2 scale space

The scale space of an image is defined as a variation of the scale of the Gaussian function with the original image convolution.

(3-1)

where * denotes convolution operations,

(3-2)

As with the formula (1-2), M,n represents the dimension of the Gaussian template (determined by). (x, y) represents the pixel position of the image. is a scale space factor, the smaller the value, the less the image is smoothed, and the smaller the corresponding scale. The large scale corresponds to the general image feature, and the small scale corresponds to the detail feature of the image.

The construction scale space of 3.3 Gauss pyramid is represented by Gauss Pyramid, and the construction of Gaussian pyramid is divided into two parts: 1. The Gaussian blur of different scales is made on the image;

2. Reduce the image to sample (sampling).

Image pyramid model refers to the original image is continuously reduced to sample, a series of different sizes of images, from large to small, from bottom to top of the tower-shaped model. The original image is the first layer of the gold tower, and each time the new image of the sample is reduced to a layer of the pyramid (one image per layer), and each pyramid has a total of n layers. The number of layers of the pyramid is determined by the original size of the image and the size of the tower image, which is calculated as follows:

(3-3)

Where M,n is the size of the original image, T is the number of the smallest dimension of the tower image. For example, for an image of size 512*512, the size of the image on the pyramid is shown in table 3.1, when the tower image is 4*4, n=7, when the tower image is 2*2, n=8.

In order to make the scale reflect its continuity, Gaussian pyramid filter is added on the basis of simple drop sampling. 3.1, the image pyramid each layer of an image using different parameters to do Gaussian blur, so that each layer of the pyramid contains multiple Gaussian blur images, the pyramid each layer of multiple images are called a group (Octave), the pyramid has only one set of images per layer, the number of groups and the pyramid layer is equal, using the formula (3-3) calculation, Each group contains more than one (also called layer interval) image. In addition, the initial image (underlying image) of a set of images on a Gaussian pyramid is sampled from the third-to-last image compartment of the previous set of images when the sample is dropped.

Note: Because of multiple images in the group stack, so many images within the group is also called multi-layer, in order to avoid confusion with the concept of pyramid layer, in this article, if not specifically described is the pyramid layer, the layer generally refers to the layers of images within the group.

Note: As shown in section 3.4, in order to detect the extreme points of S scale in each group, the dog pyramid needs to be s+2 layer image, and the dog pyramid is subtracted from the two adjacent layers of the Gaussian pyramid, then the Gaussian pyramid needs s+3 layer images per group, and the actual calculation S is between 3 and 5. When taking s=3, it is assumed that the Gaussian Pyramid store index is as follows:

Group No. 0 (i.e. section 1): 0 1 2 3 4 5

Group 1th: 6 7 8 9 10 11

Group 2nd:?

Then the first image of group 2nd is sampled according to the image of index 9 in the first group, and the others are similar.

3.4 Gaussian differential pyramid

2002 Mikolajczyk in a detailed experimental comparison, the maximum and minimum values of the Laplace function of scale normalization are compared with other feature extraction functions, such as gradient, Hessian or Harris angle features, which can produce the most stable image features.

As early as 1994, Lindeberg found that the Gaussian difference function (difference of Gaussian, referred to as the dog operator) is very similar to the Laplace function of scale normalization. The relationships can be deduced from the following formula:

Using differential approximation instead of differential, there are:

So there are

Where K-1 is a constant, it does not affect the location of the extremum point to be obtained.

As shown in 3.2, the red curve represents the Gaussian difference operator, and the blue curve represents the Laplace operator. Lowe uses a more efficient Gaussian difference operator instead of the Laplace operator for extremum detection, as follows:

(3-4)

In the actual calculation, the Gaussian pyramid is used to subtract the upper and lower two layers of each group, and the Gauss difference image is obtained, and the extremum is detected by 3.3.

3.5 Spatial Extreme Point Detection (initial exploration of key points)

The key points are composed of the local extreme points of the dog space, and the initial exploration of the key points is accomplished by comparing the two layers of each dog in the same group. In order to find the extreme point of the dog function, each pixel is compared with all its neighboring points to see if it is larger or smaller than the neighboring points of its image and scale fields. As shown in 3.4, the intermediate detection point and its 8 adjacent points of the same scale and the upper and lower adjacent scale corresponding to the 9x2 point of a total of 26 points compared to ensure that both the scale space and the two-dimensional image space to detect extreme points.

Due to the comparison in the adjacent scale, each group of 3.3 right-side 4-layer Gaussian differential gold tower, only in the middle two layer of the extreme point detection of two scales, other scales can only be carried out in different groups. In order to detect the extreme points of s scales in each group, the dog pyramid needs to be s+2 layer images per group, while the dog pyramid is subtracted from two adjacent layers of the Gaussian pyramid, then each group of Gaussian pyramid needs s+3 layer image, the actual calculation s between 3 and 5.

Of course, the extreme points produced are not all stable feature points, because some extreme point response is weaker, and the dog operator produces a strong edge response.

3.6 parameters to be determined for building scale space

-scale space coordinates

Number of O-groups (octave)

Number of layers in S-group

In the above scale space, the relationship between O and S is as follows:

(3-5)

This is the base layer scale, O is the index of the group octave, and S is the index of the group inner layer. The scale coordinates of the key point are calculated by using the formula (3-5) based on the group and the layer within the group where the key is located.

At the very beginning of the Gaussian pyramid, it is necessary to pre-blur the input image as the No. 0-level image of the No. 0 group, which is equivalent to discarding the highest spatial sampling rate. It is therefore common practice to increase the scale of the image by one-fold to create a group of 1. We assume that the initial input image is a Gaussian blur that has been used in order to combat confusion, which is equivalent if the size of the input image is enlarged by bilinear interpolation by one-fold.

The k in the pickup (3-4) is the reciprocal of the total number of layers in the group, i.e.

(3-6)

When constructing a Gaussian pyramid, the scale coordinates of each layer within the group are calculated as follows:

(3-7)

The initial scale, Lowe, S is the layer index within the group, and the same group-scale coordinates are the same in different groups. The next layer of image in the group is obtained by Gaussian Blur from the previous image. Formula (3-7) is used to generate a Gaussian image of different scales in a group at a time, and in calculating the scale of a certain layer of image in a group, it is calculated directly using the following formulas:

(3-8)

The inner scale of the group determines the size of the sampling window in the direction assignment and characterization.

From the top, the formula (3-4) can be recorded as

(3-9)

Figure 3.5 is the construction of the dog Pyramid, the original image of the 128*128 of jobs images, enlarged one times after the construction of the pyramid.

4, Key point positioning

The extremum points detected by the above method are the extreme points of discrete space, and the following three-dimensional two functions are used to accurately determine the position and scale of key points, while removing the key points of low contrast and the unstable edge response points (because the dog operator produces strong edge response) to enhance the matching stability and improve the anti-noise capability.

4.1 Precise positioning of key points

The extreme point of discrete space is not the true extremum point, and Figure 4.1 shows the difference between the extreme point and the extreme point of the continuous space in the discrete space of the two-dimensional function. It is called subpixel interpolation (sub-pixel interpolation) to interpolate the well-known discrete-space extremum points of the continuous space.

In order to improve the stability of the key points, the curve fitting of the scale space dog function is needed. The Taylor expansion (fitting function) using the dog function in the scale space is:

(4-1)

Among them,. Derivation and let the equation equal to zero, you can get the extreme point offset is:

(4-2)

Corresponding to the extremum point, the value of the equation is:

(4-3)

Where the offset of the relative interpolation center is represented, when it has an offset greater than 0.5 on either dimension (that is, x or Y or), it means that the interpolation center has shifted to its neighboring point, so the position of the current key must be changed. At the same time the new position is interpolated repeatedly until convergence, it is possible to exceed the number of iterations set or beyond the bounds of the image, where such a point should be deleted, 5 iterations in the Lowe. In addition, an over-small point is susceptible to noise disturbances and becomes unstable, so it will be less than an XP value (the Lowe paper uses 0.04/s when implemented with 0.03,rob Hess) to remove extreme points. At the same time, the exact position of the feature point (the original position plus the fitted offset) and the scale () are obtained in this process.

4.2 Eliminating Edge response

The extremum of a poorly defined Gaussian difference operator has a greater principal curvature across the edge and a smaller principal curvature in the direction of the vertical edge.

The dog operator produces a strong edge response and needs to eliminate the unstable edge response points. Gets the Hessian matrix at the feature point, and the main curvature is obtained by a 2x2 Hessian matrix H:

(4-4)

H's eigenvalues α and β represent gradients in the X and y directions,

(4-5)

Represents the sum of the matrix H diagonal elements, representing the determinant of the matrix H. If it is a large eigenvalue of α, but a smaller characteristic value of β, then

(4-6)

The derivative is estimated by the neighbor difference of the sampling point, as described in the next section.

The principal curvature of D is proportional to the eigenvalue of H, which is the maximum eigenvalue of α, and β is the smallest eigenvalue, the value of the formula is the smallest when two eigenvalues are equal, and increases with the increase. The larger the value, the greater the ratio of the two eigenvalues, that is, the larger the gradient in one direction, and the smaller the gradient in the other direction, and the edge is precisely the case. Therefore, in order to eliminate the edge response point, the ratio needs to be less than a certain threshold, so in order to detect whether the main curvature is under a certain domain value R, only the detection

(4-7)

(4-7) The key point is retained when it is established, and the other is rejected.

In Lowe's article, take r=10. Figure 4.2 The right side is the key point map after eliminating the edge response.

4.3 Derivation by finite difference method

The finite difference method approximates the continuous value of the independent variable in the differential equation by the function value corresponding to the variable's discrete value. In the finite difference method, we abandon the characteristic that the independent variable can take the continuous value in the differential equation, and pay attention to the function value corresponding to the discrete value of the independent variable. But in principle, this method can still achieve arbitrary satisfactory calculation accuracy. Because the continuous numerical solution of the equation can be approximated by decreasing the number of discrete values of independent variables, or by interpolation of function values on discrete points. This approach is developed with the advent and application of computers. Its calculation format and program design are more intuitive and simple, so it is widely used in computational mathematics.

The specific operation of the finite difference method is divided into two parts:

1. Using difference to replace the differential equation, the continuous variable is discretized to obtain the mathematical form of the difference equation Group;

2. Solve the difference Equation Group.

The first and second-order micro-quotient of a function at x point can be represented approximately by the difference of the value of the function on the two points adjacent to it. For a univariate function f (x), x is a continuous variable defined on the interval [a, b], and the interval [A, a] is discretized with a step, we get a series of nodes,

The approximate values of f (x) at these points are then calculated. Obviously, the smaller the step h, the better the approximate solution precision. The node adjacent to the node has a and, so at the node can construct the following form of difference:

First-order forward differential of a node

First-order backward differential of a node

The first-order central difference of a node

In this paper, using the central difference method to solve the derivative used in the fourth section of Taylor expansion, the following derivation is made.

The Taylor expansion at which the function f (x) is located is:

(4-8)

The

(4-9)

(4-10)

The items after H-squared are ignored, the associated vertical (4-9), and (4-10) the solution equations are:

(4-11)

(4-12)

The Taylor expansion of the binary function is as follows:

The two-dimensional mixed biasing is given as follows when the expansion is ignored for the secondary-joint solution equation:

(4-13)

In this paper, all derivative calculations encountered by 4.1,4.2 are deduced. Similarly, the approximate differential representation of arbitrary bias can be obtained by using the multi-Taylor expansion.

In the image processing, take h=1, in the image shown in Figure 4.2, the pixel 0 of the basic Midpoint derivative formula is organized as follows:

4.4 Three-order matrix inverse formula

The inverse algorithm of high-order matrix mainly includes two methods: Normalization method and elimination method, and the third-order matrix inverse formula is summarized as follows.

If the matrix

Reversible, instant,

(4-14)

5, the key point direction distribution

In order for the descriptor to have rotational invariance, it is necessary to take advantage of the local characteristics of the image to assign a datum direction to each key point. The stable direction of local structure is obtained by using image gradient method. For the key points detected in the dog pyramid, the gradient and directional distribution characteristics of the pixels in the 3σ neighborhood window of the Gaussian pyramid image are collected. The modulus and direction of the gradient are as follows:

(5-1)

L is the key point of the scale space value, according to Lowe, the gradient modulus of M (x, y) by the Gaussian distribution, according to the scale sampling 3σ principle, the Neighborhood window radius.

After the gradient calculation of the key points is completed, the histogram is used to calculate the gradient and direction of the pixels within the neighborhood. The gradient histogram divides the direction range of the 0~360度 into 36 bars (bins), of which 10 degrees per bar. As shown in 5.1, the peak direction of the histogram represents the main direction of the key point, (for simplification, only eight directions are drawn in the graph).

The peak of the directional histogram represents the direction of the neighborhood gradient at the feature point, and the maximum value in the graph is used as the main direction of the key point. To enhance the robustness of the match, only the direction that the peak is greater than 80% of the main direction is retained as the secondary direction of the key point. Therefore, for the key position of multiple peaks of the same gradient value, multiple keys will be created in the same position and scale with different orientations. Only 15% of the key points are given in multiple directions, but the stability of key-point matching can be significantly improved. In practical programming, the key points are copied into several key points, and the direction values are assigned to these duplicated key points, and the discrete gradient direction histogram should be interpolated to achieve a more accurate direction angle value, and the results of Test 5.2 are shown.

At this point, the key points that will detect the position, scale and direction are the SIFT feature points of the image.

6, key points feature description

With the above steps, there are three messages for each key point: position, scale, and direction. The next step is to create a descriptor for each key point, describing the key point with a set of vectors so that it does not change with various changes, such as illumination changes, perspective changes, and so on. This descriptor includes not only the key points, but also the pixels around the key points that contribute to it, and the descriptors should be highly unique, so as to improve the probability that the feature points will match correctly.

The SIFT descriptor is a representation of the statistical results of the gradient of the Gaussian image in the key point neighborhood. By chunking the image area around the key points, the gradient histogram is computed to generate a unique vector, which is an abstraction and uniqueness of the image information in the region.

Lowe recommends describing the 4*4*8=128-dimensional vector characterization of the 8-direction gradient information computed in a 4*4 window within a critical-point scale space. Represents the following steps:

1. Determine the area of the image needed to calculate the descriptors

The characteristic descriptor is related to the scale of the feature point, so the evaluation of the gradient should be performed on the Gaussian image corresponding to the feature point. Divides neighborhoods near key points into D*d (Lowe recommended d=4) sub-regions, each of which is a seed point with 8 orientations for each seed point. The size of each sub-region is the same as the key direction allocation, that is, each region has a sub-pixel, each sub-region is allocated a rectangular area of the edge length of the sample (sub-pixel actually with the edge length of the rectangular area can be included, but by the formula (3-8), small, in order to simplify the calculation take its edge length Considering the actual calculation, bilinear interpolation is required and the image window edge length is required. Taking into account the rotation factor (for the next step to rotate the axis to the direction of the key), as shown in 6.1, the actual calculation of the desired image area radius is:

(6-1)

The calculation results rounded rounding.

2. Rotate the axis to the direction of the key to ensure that the rotation is invariant, as shown in 6.2.

the new coordinates of the sample points in the neighborhood after rotation are:

(6-2)

3. Assign the sample points in the neighborhood to the corresponding sub-regions, assign the gradient values in the sub-regions to 8 directions, and calculate their weights.

The rotated sample point coordinates are assigned to a sub-region within a circle with radius radius, and the gradient and direction of the sample point affecting the sub-region are calculated, and assigned to 8 directions.

The rotated sample point falls in the sub-area subscript as

(6-3)

Lowe suggests that the gradient size of the pixels in the sub-region is calculated by the Gaussian weighting, i.e.

(6-4)

Where A, B is the position coordinate of the key point in the Gaussian pyramid image.

4. Interpolation calculates the gradient in eight directions for each seed point.

As shown in Fig. 6.3, a linear interpolation of the sample points obtained by formula (6-3) in the sub-region (red dots in the blue window in the image) is calculated to contribute to each seed point. The red dots, which fall between lines No. 0 and 1th, contribute to both lines. The contribution factor for the No. 0 row 3rd column is Dr, the contribution factor for the 1th row 3rd column is 1-dr, the same, the contribution factor for the neighboring two columns is DC and 1-DC, and the contribution factor for the neighboring two directions is do and 1-do. The gradient size in each direction is eventually summed:

(6-5)

Where K,m,n is 0 or 1.

5. The 4*4*8=128 gradient information of the above statistic is the eigenvector of the key point. After the formation of eigenvectors, in order to remove the effects of illumination changes, they need to be normalized, for the overall drift of the image gray value, the gradient of the image points is the neighborhood pixel subtraction, so it can be removed. The resulting descriptive sub-vector is the normalized eigenvector.

(6-7)

6. Describe the sub-vector threshold. Non-linear illumination, the change of camera saturation is too large for the gradient value in some direction, but the influence on direction is weak. Therefore set the threshold value (after the vector normalization, generally take 0.2) truncated large gradient values. Then, a normalization process is carried out to improve the identification of the features.

7. Sort the feature description vectors by the scale of the feature points.

Thus, the SIFT feature description vector is generated.

Description vector This piece is not easy to understand, I drew a sketch for reference:

7, SIFT's shortcomings

Sift has unparalleled advantages in image invariant feature extraction, but is not perfect and still exists:

1. Real-time is not high.

2. There are sometimes fewer feature points.

3. It is not possible to accurately extract feature points for smooth edge targets.

and other shortcomings, such as 7.1, the blurred image and the edge of the smooth image, the detection of the feature points too little, the circle is powerless. Recently, there have been some improvements, the most famous of which are surf and csift.

8. Summary

I study sift algorithm more than a month, in view of the lack of knowledge, scale space technology and differential approximate derivation has been trapped me for a long while. Lowe in the paper to the details of very little, even a word not mentioned, to achieve a great difficulty. Through the multi-view, realize, summed up into this article. I think that the most detailed information about the SIFT algorithm has been shared with you so far, and I hope to criticize it.

Share to you also have the simultaneous realization of the Gaussian fuzzy source code, SIFT algorithm source code, see Appendix 1, 2. The source code is implemented using vs2010+opencv2.2.

Zdd

April 28, 2012 at Beijing Normal University

May 17, 2012 15:33:23 first time correction

Amendments: Part 3.3 of the content, figure 3.1, figure 3.5.

Revised Code: http://download.csdn.net/detail/zddmail/4309418

Resources

1, David G.lowe distinctive Image Features from Scale-invariant keypoints. January 5, 2004.

2, David g.lowe Object recognition from Local scale-invariant Features. 1999

3. Matthew Brown and David Lowe invariant Features from Interest point Groups. In British machine Vision Conference, Cardiff, Wales, pp. 656-665.

4. PETER J. BURT, MEMBER, IEEE, and EDWARD H. Adelson, the Laplacian Pyramid as a Compact Image Code. IEEE Transactions on COMMUNICATIONS, Vol. com-3l, No. 4, APRIL 1983

5, Song Dan 10905056 scale invariant feature transform matching algorithm scales invariant Feature Transform (SIFT) (PPT)

6, Raysaint's blog Sift Algorithm research http://underthehood.blog.51cto.com/2531780/658350

7, Jason Clemons Sift:scale invariant FEATURE TRANSFORM by DAVID LOWE (PPT)

8, Tony Lindeberg scale-space theory:a Basic tool for analysing structures at different scales.1994

9, SIFT official website Rob Hess <[email protected]> Sift source

10, Opencv2.2 Andrea Vedaldi (UCLA VisionLab) Realization of SIFT source http://www.vlfeat.org/~vedaldi/code/siftpp.html, opencv2.3 Switch to rob Hess's source code

11. The finite difference method of partial differential equations in scientific calculation Yang Le Editor

12. Wikipedia Sift entry: Http://zh.wikipedia.org/zh-cn/Scale-invariant_feature_transform

13, Baidu Encyclopedia Sift entry: http://baike.baidu.com/view/2832304.htm

14. Other Internet Information

Appendix 1 Gauss Fuzzy Source code

http://blog.csdn.net/zddmail/article/details/7450033

http://download.csdn.net/detail/zddmail/4217704

Appendix 2 Sift algorithm source code

http://download.csdn.net/detail/zddmail/4309418

Sponsorship: If you think this slag text is useful to you, please scan the following two-dimensional code, small sponsorship for me, a hair of two cents can be done. A programmer who doesn't raise money to marry a daughter-in-law is not a good programmer!

Sift algorithm detailed (GO)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More