A general understanding of Sift/Surf Algorithms

Source: Internet
Author: User

The surf algorithm is an accelerated version of The sift algorithm. The surf algorithm of opencv basically achieves real-time processing by matching objects in two images under moderate conditions, the quick foundation is actually only one-the integral image Haar derivation. For other differences, refer to another article on Sift in this blog.

Both scientific research and application hope that the same scenes in the two images can be automatically identified through programs like human vision, and the correspondence between them can be established, the sift (scale-unchanged feature) algorithm proposed just a few years ago provides a solution, this algorithm can match certain points (the key points mentioned later) of the same scene in two images under certain conditions. Why not match each point? The following discussion will be discussed.

There are three main steps to implement Object Recognition Using the sift algorithm: 1. extract key points; 2. attach detailed information (local features) to the key points, which is also called descriptors; 3. Through the two-party feature points (the key points attached to the feature vector), we can find several matched feature points and establish the correspondence between scenes.

In daily applications, a reference image containing an object is generally provided, and then matched in another image containing the object. The objects in the two images are generally the relationship between rotation and scaling. The brightness and contrast of the images are different, which is the most common situation. Based on these conditions, matching between objects must be achieved, the Pioneer of the sift Algorithm and its inventor thought that as long as the matching points between more than three pairs of objects are found, their one-to-one correspondence can be established through the theory of projective ry. First, in terms of shape, the object has both rotation and zoom-in changes. How can we find such corresponding points? So their idea is to first find some "stability points" in the image, which are very prominent points that will not disappear due to changes in the lighting conditions, for example, for Corner Points, edge points, highlights of the dark area, and hidden points of the bright area, since the two images have the same scenes, some method is used to extract their respective stable points, there will be matching points between these points. Based on such a reasonable assumption, the basis of the sift algorithm is the stability point. The sift algorithm is used to locate the most stable point in the grayscale image. Because the digital image is discrete, all the operations such as derivation and maximization use filters, the filter has a size, using a filter of the same size to obtain the local maximum value for two images containing the same object of different sizes may result in the condition that one side obtains the maximum value while the other side does not, however, it is easy to know that if the sizes of objects are the same, their local values will be the same. The beauty of Sift lies in the use of the image pyramid method to solve this problem. We can think of the two images as continuous, respectively using them as the bottom layer as the four-pyramid, just like the pyramid, therefore, each section is similar to the original image. Therefore, the two pyramid must have an infinite cross section containing objects of the same size, but the application can only be discrete. Therefore, we can only construct a finite layer, the more layers, of course, the better, but the processing time will increase accordingly, because the number of layers is too small, because the image of two objects with the same size may not be found in the subsample section. With the image pyramid, we can find the most local value for each layer, but the number of such stable points will be very large. Therefore, we need to use some method to suppress the removal of some points, however, the stability points at the same scale are saved. With the stability point, how can we let the program understand that they are at the same position of an object? The investigator thought of digging out a small area centered on this point, and then finding out some features in the area so that these feature accessories are on a stable point, another subtle feature of Sift is that a stable point is attached to a feature vector, and then it is like a well-developed root Root tree, firmly grasping its "land" to make it a more stable feature point, but the problem arises again. What should I do if I encounter rotation? The inventor's solution is to find a "main direction" and then look at it to know the rotation angle between the two objects. The following describes the defects of the sift algorithm.

Sift/Surt uses the henssian matrix to obtain the image's local maximum value, which is still very stable. However, in the main direction phase, it is too dependent on the Gradient Direction of the pixels in the local area, which may make the finding of the main direction inaccurate, the feature vector extraction and matching behind the scenes depend heavily on the main direction. Even if the angle is not slightly deviated, it can cause the amplification error of the feature matching behind the scenes, thus the matching fails; in addition, the acquisition of the layer of the image pyramid is not close enough, and there will be scaling errors. The feature vector extraction is also dependent on the corresponding scale, the inventor's compromise solution to this problem is to take a proper amount of layers and then perform interpolation. Sift is an algorithm that only utilizes the grayscale nature and ignores the color information. There are several descriptors that are said to be more stable than sift, some of which use the color information, let's wait and see.

Finally, we know that the same scene may have different shapes, sizes, angles, brightness, or even distortion in different photos; the knowledge of computer vision indicates that an image obtained through an optical lens can establish a projection correspondence between two objects in a plane shape, there is no linear correspondence between two curved objects, such as faces, obtained from different angles and different camera parameters, that is to say, even if we get some matching points on the faces of the two images, we still cannot export the corresponding points from them.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.