--sift algorithm of image similarity algorithm

Source: Internet
Author: User
Tags image filter scale image
A detailed explanation of scaling invariant feature transformation matching algorithm
Scale invariant Feature Transform (SIFT)
Just for Fun

zdd zddmail@gmail.com or (zddhub@gmail.com)

For beginners, from David G.lowe's thesis to the realization, there are many gaps in this article to help you across.

If you learn Sifi is to do search, perhaps opensse more suitable for you, welcome to use.

1. Summary of Sift

Scale invariant feature conversion (Scale-invariant feature transform or SIFT) is a computer vision algorithm used to detect and describe the local features in the image, it looks for the extremum point in the spatial scale, and extracts its position, scale, rotation invariants, this algorithm by David Lowe, published in 1999, summed up the completion of 2004.

Its application range includes object identification, robot map perception and navigation, image stitching, 3D model building, gesture recognition, image tracking and motion alignment.

This algorithm has its own patent, the patent owner of the University of British Columbia.

The description and detection of local image features can help identify objects, and SIFT characteristics are based on the points of interest of some local appearances on an object and are independent of the size and rotation of the image. Tolerance for light, noise, and slight changes in angle is also quite high. Based on these features, they are highly significant and relatively easy to retrieve, and are easily recognizable and rarely misunderstood in a database of large numbers of female characters. Using the SIFT feature to describe the detection rate of some object masking is also very high, even need more than 3 sift object characteristics is enough to calculate the position and azimuth. Under the current computer hardware speed and small feature database, the identification speed can be approximated to the real-time operation. The information of SIFT features is large and suitable for fast and accurate matching in mass database.

The features of the SIFT algorithm are:

1. Sift feature is the local feature of the image, which is invariant to rotation, scaling and brightness, and maintains a certain degree of stability to the angle change, affine transformation and noise.

2. The uniqueness (distinctiveness) is good, the information is rich, is suitable in the massive characteristic database to carry on the fast, the accurate match;

3. Multi-scale, even a few objects can produce a large number of sift eigenvector;

4. High-speed, optimized sift matching algorithm can even achieve real-time requirements;

5. Scalability can be easily combined with other forms of eigenvector.

Sift algorithm can solve the problem:

The performance of the image Registration/target recognition tracking is affected by the target's own state, the environment of the scene and the imaging characteristics of the imaging equipment. And the SIFT algorithm can be solved to some extent:

1. Rotation, scaling and translation of the target (RST)

2. Image affine/projective transformation (viewpoint viewpoint)

3. Illumination Effects (illumination)

4. Target occlusion (occlusion)

5. Sundries Scene (clutter)

6. Noise

The essence of the SIFT algorithm is to find the key points (feature points) in different scale spaces, and to calculate the direction of the key points. The key points found in sift are some very prominent points that will not change due to illumination, affine transformation and noise, such as corner points, edge points, bright spots in dark areas and dark spots in bright areas.

Lowe the SIFT algorithm into the following four steps:

1. Scale Space Extremum detection: Search all scale image location. The potential point of interest for scale and rotation is identified by Gauss differential function.

2. Key point positioning: In each candidate position, through a fitting fine model to determine the location and scale. The choice of key points is based on their degree of stability.

3. Direction determination: Based on the local gradient direction of the image, assigned to each key point position in one or more directions. All subsequent operations on the image data are transformed relative to the direction, scale, and position of the key points, thus providing invariance for these transformations.

4. The key point description: In the vicinity of each key point, the local gradient of the image is measured on the selected scale. These gradients are transformed into a representation that allows for greater local shape deformation and illumination changes.

This article along the Lowe steps, reference Rob Hess and Andrea Vedaldi source code, detailed SIFT algorithm implementation process. 2, Gaussian Blur

Sift algorithm is to find the key points in different scale space, and the acquisition of scale space need to use Gaussian blur to achieve, Lindeberg and others have proved that the Gaussian convolution kernel is the only transformation kernel to realize the scale transformation, and is the only linear core. This section first introduces the Gaussian blur algorithm. 2.1 Ivigos Function

Gaussian Blur is a kind of image filter, it uses normal distribution (Gaussian function) to compute the fuzzy template, and uses the template and the original image to do convolution operations to achieve the goal of fuzzy image.

The normal distribution equation of n-dimensional space is:

(1-1)

Among them, is the normal distribution of the standard deviation, the larger the value, the more blurred image (smooth). R is a fuzzy radius, and the fuzzy radius refers to the distance from the template element to the center of the template. If the two-dimensional template size is m*n, the element (X,y) on the template corresponds to the Gaussian formula:

(1-2)

In two-dimensional space, the contour of the surface produced by this formula is a concentric circle that begins with a normal distribution from the center, as shown in Figure 2.1. The convolution matrix, which is not a zero-distributed pixel, transforms with the original image. The value of each pixel is the weighted average of the surrounding adjacent pixel values. The value of the original pixel has the largest Gaussian distribution value, so there is the largest weight, adjacent pixels with the distance from the original pixel, and its weight is getting smaller. In this way, the fuzzy processing is more high than the other equilibrium fuzzy filters to preserve the edge effect.

Theoretically, the distribution of each point in the image is Non-zero, which means that every pixel's calculation needs to include an entire image. In practical applications, when calculating the discrete approximation of Gaussian functions, pixels outside the approximate 3σ distance can be regarded as ineffective, and the computation of these pixels can be neglected. Usually, the image processing program only needs to compute the matrix to guarantee the related pixel influence. Ivigos Blur of 2.2 images

According to the value of σ, we calculate the size of the Gaussian template matrix (), use the formula (1-2) to calculate the Gaussian template matrix, and the original image to do convolution, you can get the original image of the smooth (Gaussian blur) image. To ensure that the elements in the template matrix are between [0,1], the template matrix should be normalized. The 5*5 Gaussian template is shown in table 2.1.


The following figure is the 5*5 of the Gaussian template convolution calculation diagram. The Gaussian template is center symmetric.

2.3 Separation of Gaussian Blur

As shown in Figure 2.3, the use of two-dimensional Gaussian template to achieve the purpose of the fuzzy image, but because of the relationship between the template matrix resulting in the edge of the image loss (2.3 b,c), the larger the number of missing pixels, discarded template will cause black edge (2.3 D). More importantly, the Gaussian template (Gaussian core) and convolution operation will be greatly improved when it becomes larger. The Ivigos fuzzy function can be improved according to the separation of Gaussian function.

The separation of Gauss function means that the effect obtained by using two-dimensional matrix transform can be obtained by a Gaussian matrix transformation in the horizontal direction plus a Gaussian matrix transformation in the vertical direction. From the computational point of view, this is a useful feature, because this only requires a second calculation, and two-dimensional irreducible matrices need to be computed, in which, M,n is the dimension of the Gaussian matrix, m,n as a two-dimensional image dimension.

In addition, the two-time one-dimensional Gaussian convolution will eliminate the edges generated by the Ivigos matrix. (about the elimination of the edge of the discussion as shown in Figure 2.4, for the template matrix beyond the boundary of the section-dashed box, will not do convolution calculation.) The first template 1*5 in the x direction in Figure 2.4 will degenerate into a 1*3 template that only does convolution within the portion of the image. )


Appendix 1 is a Ivigos fuzzy and discrete Gaussian blur implemented by opencv2.2. Table 2.2 is a comparison between the above two methods and the Gaussian fuzzy program implemented by opencv2.3 Open Source Library.


3, scale space extremum detection

The scale space is represented by the Gaussian pyramid. Tony Lindeberg points out that the standard log (laplacion of Gaussian) operator has a true scale invariance, Lowe uses the Gaussian difference pyramid approximation log operator to detect the stable key points in scale space. 3.1 Scale space theory

The concept of scale space (scale spaces) was first proposed by Iijima in 1962, and after the popularization of Witkin and Koenderink, it was widely used in the neighborhood of computer vision.

The basic idea of the scale space theory is to introduce a parameter which is regarded as a scale in the image information processing model, to obtain the scale space representation sequence through the continuous variable scale parameter, to extract the scale space main contour of these sequences, and to use the main contour as a eigenvector to realize the edge, Corner detection and feature extraction at different resolution.

The scale-space method integrates the traditional single-scale image information processing technology into the dynamic analysis framework of the scale changing, and it is easier to get the essential feature of the image. The blur degree of each scale image in scale space becomes larger gradually, which can simulate the formation of human target on retina from near to far target.

Scale space satisfies visual invariance. The visual interpretation of the invariance is as follows: When we look at the object with our eyes, on the one hand, when the illumination condition of the object is changing, the brightness level and contrast of the retinal perception image are different, so the scale space operator is required to analyze the image from the gray level and the contrast of the image. That is to meet the gray level invariance and contrast invariance. On the other hand, relative to a fixed coordinate system, when the relative position between the observer and the object changes, the position, size, angle and shape of the image perceived by the retina are different, so the scale space operator is required to analyze the image and the position, size, angle and affine transformation of the image, that is, to satisfy the translation invariance, Scale invariance, Euclidean invariance, and affine invariance. 3.2 Representation of scale space

The scale space of an image is defined as a variable scale Gaussian function and a convolution of the original image.

(3-1)

Where the * represents the convolution operation,

(3-2)

The same as the formula (1-2), m,n represents the dimensions of the Gaussian template (determined by). (x, y) represents the pixel position of the image. Is the scale space factor, the smaller the value indicates that the image is smoothed less, the corresponding scale is smaller. The large scale corresponds to the image's general feature, and the small scale corresponds to the detail feature of the image. construction of the 3.3 Gauss pyramid when the scale space is implemented using the Gaussian pyramid, the construction of the Gaussian pyramid is divided into two parts: 1. Gaussian blur with different scales on the image;

2. To the image to do a drop sample (sampling points).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.