The Code has been open-source to GitHub, https://github.com/alibaba/simpleimageproject, in which the analyze module is located.
Original Image:
Main call method:
BufferedImage img = ImageIO.read(logoFile); RenderImage ri = new RenderImage(img); SIFT sift = new SIFT(); sift.detectFeatures(ri.toPixelFloatArray(null)); List<KDFeaturePoint> al = sift.getGlobalKDFeaturePoints();
You can also read another graph to get another list <kdfeaturepoint> al1, and then match the two lists.
List<Match> ms = MatchKeys.findMatchesBBF(al, al1); ms = MatchKeys.filterMore(ms);
First, from the above call entry, we will explain in detail the feature generation of graph sift. As for match, we will talk about it when we are free. The most difficult to understand is the query of extreme points, which is mainly explained in this section.
1. Construct a scale space, detect extreme points, and obtain scale immutability
Imagepixelarray is an array that saves the pixel points of an image after being dimmed. In the topixelfloatarray method of renderimage, the default grayscale processing simply replaces the RGB values with the average values of R, G, and B and performs normalization (divided by 255 );
From the call entry, we enter the detectfeatures method of the user interface of the main class sift. It actually calls detectfeaturesdownscaled (Imagemap IMG, int bothdimhi, double startscale ), let's take a look at the beginning of several lines of init work, and finally use the preprocsigma (1.5) parameter to perform Gaussian modulus preprocessing on the image.
if (preprocSigma > 0.0) { GaussianArray gaussianPre = new GaussianArray(preprocSigma); img = gaussianPre.convolve(img); }
The purpose of Gaussian Blur is to link large gray-scale similar points into one piece, and to make some prominent points more different from other points, just as we use the "printprint" effect to remove a large number of contiguous points and leave only the outlines.
The Preprocessing result is as follows:
Now we can't see the strong effect, but it's just pre-processing. Then we use the buildoctaves method of pyramid to construct the 8-degree pyramid, and we track it in the method.
Public int buildoctaves (imagepixelarray source, float scale, int levelsperoctave, float octavesigm, int minsize) {This. octaves = new arraylist <octavespace> (); octavespace downspace = NULL; imagepixelarray Prev = source; while (prev! = NULL & Prev. width> = minsize & Prev. height> = minsize) {octavespace OSP = new octavespace (); // create both the Gaussian filtered images and the dog maps OSP. makegaussianimgs (prev, scale, levelsperoctave, octavesigm); // construct the Gaussian fuzzy image OSP in the current 8 degree space. makegaussiandiffimgs (); octaves. add (OSP); Prev = OSP. getlastgaussianimg (). halved (); // the original image of the next 8-degree space if (downspace! = NULL) downspace. Up = OSP; OSP. Down = downspace; downspace = OSP; scale * = 2.0;} return (octaves. Size ());}
Do not read
OSP. makegaussianimgs (prev, scale, levelsperoctave, octavesigm); OSP. makegaussiandiffimgs ();
The whole method is to constantly construct layers based on the source image, which is also the most subtle place of the Gaussian pyramid. The concept of scale space is not a simple tower composed of different sizes, nor is it a straight tower processed by different Fuzzy Factors of the same size. In fact, the pyramid is not accurate, more like a traditional Chinese pagoda.
First, it is blurred by the original image based on different fuzzy factors (it is said that it is smoother and more accurate, that is, to make the color values closer to them to highlight the strong contrast ), this is done on the same size. These sets of images processed by different Gaussian Blur factors of the same size are called an octal space. It is equivalent to a layer in the Pagoda, and then downward sampling, that is, using one of them to scale down by 1/2 as the source image and then performing Gaussian fuzzy processing in another 8 degree space, it is not until the width or hight of the layer is smaller than minsize (32) that the set of all the 8-degree spaces that are continuously blurred and sampled downward is called the Gaussian pyramid ?). This is to ensure that a certain point of the source image has stable features at different scales.
Then let's look back at the OSP. makegaussianimgs method. For an 8 degree Space original image, multiple Gaussian blur images are constructed with different Sigma parameters. Because the bottom-layer source image is the largest, I chose the eight-degree empty smoothedimgs with a scale of 2 in the tower.
We can see that the degree of blur increases as the fuzzy factor changes. An smoothedimgs array is obtained for every 8-degree space in the tower. A total of six images are generated here, because there are comments in the method, we finally need to get the extreme points in the three-layer differential scale, the extreme points obtained from each scale must be compared in the three-dimensional space (which is also a revolutionary breakthrough that sift is most different from other features), that is, the corresponding points of the previous and next scale must be compared, therefore, the three-layer scale requires at least five differential images, and the five differential images require at least six Gaussian fuzzy images to be generated.
Then, the image in smoothedimgs is obtained through OSP. makegaussiandiffimgs (), and the difference is calculated in order into the diffimags array.
For two images that are blurred using different parameters, the difference between the contiguous parts with similar pixels is extremely small, and only points with features such as edges and corners are significantly different:
Do not care about the contiguous black, because the value after the difference is very small and has been normalized. When I restore it to the image, the absolute value (may be negative) is taken) multiply 10 and then modulo 255 to display it clearly. It can be seen that the vertices with strong features in these images are edges, corners, and other places.
2. Filter and precisely locate Feature Points
Now return to detectfeaturesdownscaled. After surface processing, each 8-degree space in the pyramid contains a octavespace object that stores a Differential Image array. The findpeaks below is actually a cycle from the first 2nd images to the last two images. The current image compares each vertex with the surrounding vertex. If it is the maximum or minimum value, it is considered as the extreme point. (The image is surrounded by a three-dimensional image. It is not only compared with the eight vertices around the current image, but also compared with the nine vertices corresponding to the previous image and the next image)
checkMinMax(current, c, x, y, ref, true); checkMinMax(below, c, x, y, ref, false); checkMinMax(above, c, x, y, ref, false); if (ref.isMin == false && ref.isMax == false) continue; peaks.add(new ScalePoint(x, y, curLev));
Return to detectfeaturesdownscaled. The following filterandlocalizepeaks method is used to overwrite and locate the original page12/13 method.
Istooedgelike is concerned with vertices that are too similar to edges. This means that the continuous line is not a good extreme point. Only isolated vertices such as corner points are the extreme points, why is the continuous line of "too much like the edge" not good? Edge Points are characterized by a very small gradient along the edge. In short, the point of a line is not much different, but the gradient in the direction that is tangent to it is very large. Because the gradients are large, they are easy to become extreme points, but because the two points along the line have almost no difference in the gradient, the extreme points themselves are local features, therefore, when two vertices on the edge line are projected to a certain extreme value, there is no difference between vertices 1 and vertices 2. therefore, remove these linear continuous points ".
After the point "too much like edge" is taken into consideration, the following localizeisweak will precisely optimize the extreme points in each scale space, because the extreme point is a very precise coordinate calculated in a continuous scale space, such as (0.12345678, 0.23456789), and the point of the original image is an integer, the coordinates of the scale space are hashed. The extreme points are mapped to the integer coordinates of the original image, so there must be an adjustment process.
Based on the coordinates of the Extreme Points and Sigma parameters, the two-price derivative and sub-pixel calculation are mainly three elements. After the exact match to the original point, a coordinate of the original point will be obtained. The Sigma parameter and the adjusted value local. dvalue will simplify the following considerations:
If (math. Abs (peak. Local. scaleadjust)> scaleadjustthresh) continue;
If (math. Abs (peak. Local. dvalue) <= dvaluelothresh) continue;
Simply put, when the original point is matched outside a certain range, it is recommended that the value of dvaluelothresh be 0.03 in the original paper, which is actually 0.008.
3. Specify direction parameters for each key point
The query for extreme points is completed. The following is a comparison of these points and turnover points to generate some vectors:
When generating the key direction and gradient, we use the pretreatmagnitudeanddireimimgs method to calculate the gradient direction and gradient value of all vertices in the difference graph, because the direction of a feature point is finally calculated by the gradient direction ladder value of the 64 points around it, so that the first point may be treated as the "surrounding point" by multiple feature points ", if the gradient increment value of a certain point is calculated only as the surrounding point, many points will be calculated multiple times, so that the calculated part number will be greater than all points. So we will calculate each vertex first and store it in an array.
For more information, see the comments of makefeaturepoint.
4. Generate the description of key points
Createdescriptors has detailed comments.