The full name of the SIFT algorithm is scale-invariant feature transform, scale invariant feature conversion, is a feature that does not change with the rotation of the image scale, so the SIFT feature does not change with the enlargement or rotation of the image. At the same time, due to some special processing in extracting the features, the SIFT features have strong adaptability to the illumination change. The following is the algorithm flow: 1 Building scale space
The original image is executed several times from top to bottom sampling to obtain many different images, as shown in Figure 1, each image has the same content but long width is half of the previous metric. The pictures are stacked up from the bottom up in order from big to small to form a pyramid-like structure. Tips: The pyramid layer can be customized according to the actual situation, because the sample will increase the memory consumption and calculation, so for larger pictures usually do not do on the sample processing.
Figure 1 Scale space formed after multiple sample drop of original picture
For each scale space of the picture to perform 5 times the Gaussian transformation (the number of layers can be customized), turning it into 6 layers (from 1 to 4),-1 layer for the scale space of the original image, according to Formula 1 to calculate the image in each scale space Gaussian transformation parameters.
Σ (o,s) =Σ02O+S/S Formula 1
In Formula 1, σ0 is 1.6 (this should be an empirical value), O is the current scale, S is the layer of the image in the current scale, S is the layer of each scale space. It is assumed that when processing each scale space, the input image is processed by Gaussian smoothing, and its value is σn=0.5. In David G. Lowe's paper, the bottom scale of the pyramid is obtained by interpolation (above sampling), so the processing of this layer is σn=1.0. So after the adjustment, the actual σ (O,s) as shown in Formula 2.
Σ (o,s) = (Σ02O+S/S) 2−σ2n−−−−−−−−−−−−−√ Formula 2
Through the above calculation we can get the multilayer images in the scale space shown in Figure 2.
Fig. 2 The results of several σ different Gaussian blur for a particular scale image are performed
To this step, basically the establishment of a scale space, sift algorithm for the image of the scale changes have invariance, the reason lies in this scale space, but the scale space is not omnipotent, because in the implementation of the SIFT algorithm, the scale of the scope of space is limited, can only contain most of the scale of the image , but the SIFT can not guarantee the scale invariance of the image features which are far beyond the range. 2 Find the extreme point
2002 Mikolajczyk in the detailed experimental comparison, it is found that the extremum of the normalized Goslapuras function can produce the most stable image features compared with other methods, and the Gaussian difference function is very close to the Goslapuras function of the normalized scale. In the previous step has been a Gaussian transformation of all the pictures, where only to the same scale adjacent to the two Gaussian picture, you can get a differential Gaussian picture, in order to more image of the expression, the difference operation according to Figure 3 to say:
Fig. 3 schematic diagram of calculating Gaussian difference image
The same operation is performed on each scale space, and finally a differential Gaussian pyramid similar to the previous Gaussian pyramid can be obtained. After the differential Gaussian pyramid is obtained, we can search the extreme point, search the extreme point of the way is very simple, each of the scale of the Gaussian differential image stacked together to form a three-dimensional data set, not counting the edge of the pixel, each pixel point will have 26 pixels adjacent to it, If the pixel is the largest or smallest of the total 27 points to record the point of information as the extreme point, the calculation steps can be described in Figure 4, if the black dots in all gray points in the maximum or minimum pixel value, the black spot is tentatively the extreme point.
Fig. 4 Searching and locating the extremum point in the differential Gaussian scale space
3 The extreme point of filtering the Extremum point 3.1 removing the large offset and the low response value
The extremum points obtained by comparing 27 pixel points are not the extreme points in real sense, and these points are selected in the discrete space. But the real extremum point may not be selected, fig. 5 describes the relationship between the discrete extremum point and the continuous extremum point in two-dimensional space.
The relationship between the discrete extremum point and the continuous extremum point in the space of Fig. 52 Dimension
Because it is now dealing with a three-dimensional discrete data set, we need to find a second order Taylor expansion, and the expansion of the countdown to 0, get Formula 3.
D (x^) =d+∂dt∂x+12x^∂2d∂x2=d+12∂2d∂x2x Formula 3
Because it is in a three-dimensional data set, (X,y,s) is a three-dimensional vector. When the countdown is 0 o'clock, if the interpolated Center (x,y,s) is offset to the previous discrete center by more than 0.5, the interpolation center has been displaced to its adjacent center point, so that the point is removed, and if | D (x) |<0.03, it is stated that such points are less responsive and unstable to be deleted. 3.2 Delete Edge effect point
Because the extremum points of the differential Gaussian removal tend to have a relatively large response at the edge of the image, these points are susceptible to noise disturbance and become unstable, the points obtained by step 3.1 have a lot of interference to be further filtered, so also use the Hessian matrix to remove edge points. The principle of using the Hessian matrix to detect the extreme points we have previously found is that although the Gaussian differential response has a large principal curvature at the edge, the principal curvature at the direction of the vertical edge is small, because the eigenvalue of the Hessian matrix is proportional to the principal curvature of the differential Gaussian equation, So the principal curvature can be solved by Hessian matrix. The representation of the Hessian matrix is shown in Formula 4.
H (x,y) =[dxx (x,y) Dxy (x,y) Dxy (x,y) dyy (x,y)] Formula 4
In the calculation process, we can not seek the eigenvalues of the Hessian matrix, because we only pay attention to the ratio of eigenvalues, for the edge point, because the main curvature in the two direction difference is larger, so there will be a greater ratio. Α=λmax, Β=λmin, has the results shown in formulas 5 and 6.
Tr (h) =dxx+dyy=α+β Formula 5 Det (h) =dxxdyy− (DXY) 2=αβ Formula 6
So gamma is α/β and can be calculated using a formula 7来:
Tr (h) 2Det (h) = (α + β) 2αβ= (γ+1) 2γ=γ+1γ+2 Formula 7
When the principal curvature of the two directions is the same, the TR (h) 2/det (h) is the smallest, and as the difference in the main curvature size becomes larger, (tr (h) 2/det (h) becomes larger. In Lowe's paper, the threshold for gamma selected was 10. When all of the extreme points are filtered, these points are the final key points of detection. 4 Compute the main direction of the feature point
This step will process each feature point in turn to extract its main direction. The main purpose of extracting the main direction is to realize the rotation invariance of the image. Since we know the scale space of each feature point, it is easy to get the Gaussian image of this scale, using L (X,y) to represent the Gaussian image with the scale σ. On the Gaussian image, the gradient and amplitude of the image are obtained with the feature point as the center, and the formula 8 and 9 are calculated as follows:
M (x,y) = (L (x+1,y) −l (x−1,y)) hydrate (L (x,y+1) −l (x,y−1)) 2−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−√ formula 8θ (x,y) =tan−1 (L (x+1,y) −l (x−1,y) L (x,y+1) −l (x,y−1)) Formula 9
In the calculation, the range of 0~360度 in the gradient direction is divided into 36 parts to perform the calculation, and a histogram containing 36 columns is obtained, which shows only 8 columns in order to facilitate the display of Figure 6:
Fig. 6 The gradient and amplitude of the image are obtained by finding the region of the radius 3x1.5σ on the Gaussian picture, and the histogram is computed.
The main direction is not necessarily one, when the main direction is selected, as long as the amplitude reaches 80% of the highest value, and is greater than the two directions adjacent to the amplitude can be determined as the primary direction. Tips: For some larger images you will extract thousands of features that contain a lot of coverage, so you can extract the gradient of all the Gaussian images of all the scales before calculating the main direction, which can greatly reduce the amount of computation, But the disadvantage of doing so is that it takes up more memory. 5 turning the feature point into a 128-D vector
This step will convert all the main directions of the feature points into a 128-D vector, the coordinates of all the feature points and the current scale space and the parameter σ of the current Gaussian transformation are known, and the Bpxbp area adjacent to the feature point is selected, the dimensions of each subregion are mσ, m=3, bp=4 ( should also be an empirical value), taking into account the use of bilinear interpolation and rotation, the selected region at the time of calculation (mσ (bp+1) 2√) 2. Similar to the main direction of the feature point, the direction of each sub region is obtained, but this time the 360-degree range is divided into 8 groups, and the central interval of each group is 45 degrees. In the solution, it is necessary to pay attention to the main direction to solve, but also to rotate the angle to the main direction to enhance the operator's rotation invariance. A Gaussian weighting method is used to set weights for sub regions in different locations. You end up with a vector set like Figure 7, which is just the vector of the 4x4x8=128 dimension.
Fig. 7 The direction of the region of (MΣBP) 2 on the Gaussian image and the vector which is worth to 128 D
After the calculation, the obtained vector is normalized, and the component with the value greater than 0.2 is truncated after the normalized processing (the value greater than 0.2, and 0.2), so as to prevent the illumination from being affected. After truncation, we have to do normalization again, and finally, in order to save and compute, the floating-point component is converted to a 8-bit unsigned type variable between 0~255. Finally, we can get the experimental results shown in Figure 8.
Figure 8 Results from the sift of the original image (only 50 feature points are displayed)
Image source: Oxford Building 5k Database
:
[1] David G. Lowe. Object recognition from the local scale-invariant features. Computer Vision, September 1999, Vol.2, pp. 1050-1157
[2] David G. Lowe. Distinctive Image Features from Scale-invariant keypoints. International Journal of Computer Vision, November, Volume Issue 2, pp. 91-110
from:http://www.duzhongxiang.com/sift_algorithm/