Analysis on the characteristics of--sift/surf, Haar and generalized Hough transformations of three object recognition algorithms

Last Update:2018-07-25 Source: Internet

Author: User

Tags knowledge base

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

(Refer to the analysis content of csdn Bo main cy513)

First, describe how humans recognize objects:

How humans recognize an object, of course, to have a concept of what is in front of this object, the human life has begun by visual acquisition of all things in the world information, including an object shape, color, composition, and through learning to recognize the object of other information such as physical, chemical characteristics, Such information cannot be observed by observation; The person is sure of a new thing, and the various features of the object form the data stored in the human brain, and in the future, when encountering similar objects, the object is identified by grasping the characteristics of the object. The feature-based recognition algorithm in image science is completely calculated to obtain and compare features. After all, the human intelligence mechanism so far, human beings have only realized some superficial things, it can be said that human grasp the characteristics, the method of comparative features is not like the image of the algorithm to be done through complex procedures, precise calculations to complete, Why can we recognize a thing at a glance, no matter what shape it is twisted into, no matter what color it turns into, no matter where we look at it from any point of view, humans will always be able to quickly recognize the characteristics of objects, This is mainly because people have a strong knowledge base and inference system and the unknown brain storage and search mechanism, human beings at all times in the acquisition of input, and added to the knowledge base, more often through the reasoning, induction, abstraction and other thinking methods to further expand the human knowledge base, through complex links to establish index, for search , and people in the identification of the time is not the only method, can be calculated, search, reasoning, simulation and other methods of one or more of the synthesis. The image recognition algorithm uses the characteristic method, but has the essential difference with the human recognition, but although it is only such a small step, it is enough to produce the huge application, and in the unceasing consummation.

Sift/surf based on gray-scale graphs,

First, the establishment of image pyramid, the formation of three-dimensional image space, through the Hessian matrix to obtain the local maximum value of each layer, and then in the extreme point around 26 points for NMS, thus obtaining a rough feature point, and then using two interpolation method to obtain the exact characteristics of the layer (scale), that is to complete the scale unchanged.

Second, select a corresponding neighborhood in the feature point, and find out the main direction, in which sift uses the gradient direction of all the points in a square neighborhood, finds the direction that occupies more than 80%, and the surf chooses the circular neighborhood, and uses the method of active fan to find the main direction of the feature point. Aligning in the main direction completes the rotation unchanged.

Three, the main direction for the axis can be at each feature point to establish coordinates, sift in the feature point selection of a size and scale corresponding to the square area, divided into 16 pieces, statistics each piece along eight direction accounted for the proportion, so the feature points formed a 128-dimensional eigenvector, the image is normalized to complete the strength unchanged , and the surf is divided into 64 pieces, the accumulation of the dx,dy,|dx|,|dy| of each block is counted, and the same 128-dimensional vector is formed, and then normalized, the contrast is completed and the intensity is not changed.

The surf algorithm is an accelerated version of the SIFT algorithm, and its fast Foundation is actually only one-integral image Haar derivative.

No matter the scientific research or the application hope can be like human vision through the program automatically find the same scene in two images, and establish their correspondence between the SIFT (scale invariant feature) algorithm proposed in previous years provides a solution, This algorithm can make certain points of the same scene in two images under certain conditions (the key points mentioned) can be matched, why not every point is matched. The following discussion will be mentioned.

The SIFT algorithm realizes the object recognition mainly has three major processes, 1, extracts the key point, 2, attaches the detailed information to the key point (local characteristic) is so-called descriptor, 3, through the two-party characteristic point (with the key point of the eigenvector) to find the matching of several pairs of feature points, It also establishes the correspondence between the scenes.

In everyday applications, a reference image containing an object is given, and then matched in another image containing the same object. The objects in both images are generally just the relationship between rotation and scaling, plus the brightness and contrast of the image, these are the most common situations. Based on these conditions to achieve the matching between objects, the pioneer of the SIFT algorithm and its inventors think that as long as we find more than three pairs of objects matching points can be established by the theory of projective geometry of their one by one correspondence. First of all, in the shape of the object has both rotation and zoom out of the changes, how to find such a corresponding point. So their idea is to first find some of the image of some "stable point", these points are some very prominent points will not be lost due to changes in lighting conditions, such as corner points, edge points, dark spots and bright areas of the dark point, since two images have the same scene, then use a method to separate the respective stability points, There will be matching points between these points, based on the reasonable assumption that the SIFT algorithm is based on a stable point.

Sift algorithm to find a stable point of the method is to find the local maximum value of the gray scale, because the digital image is discrete, want to take the derivative and the most value of these operations are using a filter, and the filter is size, Using a filter of the same size to find the local maximum value of two images with different sizes of the same object would have the possibility of one party seeking the most value and the other not, but it is easy to know that if the dimensions of the objects are consistent, their local values will be the same. The subtlety of SIFT is to use the image pyramid method to solve this problem, we can think of two images as a continuous, with them as the bottom of the pyramid, like the pyramids, then each section and the original image, then the two pyramids will inevitably contain a uniform size of the object's infinite cross-section, But the application can only be discrete, so we can only construct a finite layer, the more layers of course, the better, but the processing time will be increased, the number of layers is too small, because the downward sampling of the section may not be able to find the size of a consistent two objects image. With the image pyramid you can find the local maximum value for each layer, but the number of such stable points will be very considerable, so we need to use some method to suppress the removal of some points, but the same scale to save the stability point. With the stability point, how to let the program understand that they are the same position between objects. The researchers think of the point as the center to dig a small area, and then find out some characteristics of the region, so that the characteristics of the attachment to the stability point, another subtle sift is that the stability point attached to the eigenvector, like a well-rooted roots of the root of the firm grasp its "land", making it a more stable feature point, But the problem came again, what to do with the rotation of the situation. The inventor's solution is to find a "main direction" and then, in line with it, you can know the angle of rotation of the two objects. The defects of the SIFT algorithm are discussed below.

Sift/surt uses the Henssian matrix to obtain the image local maximum or very stable, but in the main direction phase too dependent on the gradient direction of the local region pixels, it is possible to find the main direction is not accurate, the following feature vector extraction and matching are heavily dependent on the main direction, Even if the angle of the small deviation can cause the amplification error of the following feature matching, so that the matching is not successful, and the image pyramid layer is not close enough to make the scale error, the following feature vector extraction also relies on the corresponding scale, the inventor on this issue of the compromise solution is to take the appropriate amount of layer and then interpolation. Sift is a kind of algorithm that only uses the gray-scale property, ignoring the color information, and then there are several descriptors which are said to be more stable than sift, some of which use the color information, let us wait and see.

Finally, we know that the same scenery may appear in different photographs of different shapes, sizes, angles, brightness, or even distortion; the knowledge of computer vision indicates that the image obtained through an optical lens can be mapped to two objects of a plane shape, There is not a linear correspondence between the two images of a surface object, such as a human face, which is obtained at different angles from different camera parameters, that is, we cannot deduce the correspondence of the other points even if we get some matching pairs in the faces of the two images.

Haar features are also based on gray-scale graphs,

Firstly, the classifier is trained by a large number of object images with obvious Haar feature (rectangle), the classifier is cascade, each level is reserved with the approximate same recognition rate to the next level of candidate object with object characteristics. Each level of the sub-classifier is composed of many Haar features (calculated from the integral image, and save the next position), there are horizontal, vertical, oblique, and each feature with a threshold and two branch values, each sub-classifier with a total threshold value. When the object is recognized, the same integral image is used to prepare for the calculation of the Haar feature, then the window with the same size as the window of the object is trained to traverse the whole image, and then the window is enlarged, and the search object is also traversed; whenever the window is moved to a position, the Haar feature within that window is calculated. After weighting and comparing the threshold values of the Haar features in the classifier to select the left or right branch values, the branch value of the accumulated one level is compared with the threshold of the corresponding level, which is greater than the threshold value before the next round of filtering can be entered. This object is identified by a large probability when the class is passed through the classifier.

The generalized Hough Transform is also based on the grayscale graph,

Using the contour as a feature, the gradient information is fused and the object is identified by the voting method.

Characteristics and similarities and differences and application occasions:

The

Three algorithms are all based on intensity (grayscale) information, which are characteristic methods, but the characteristic of Sift/surf is a kind of characteristic with strong directivity and luminance, which makes it suitable for the case of rigid deformation and slight perspective deformation. Haar feature recognition method with a little artificial intelligence, For objects such as human faces, which have obvious, stable structure, the Haar feature is most suitable, as long as the structure is relatively fixed, even if the distortion of the nonlinear deformation is still recognizable, the generalized Hough transform is exactly the exact match, which can get the parameter information of the object's position direction and so on. The first two methods are basically to obtain the local characteristics and then match one by one, but the local characteristics of the calculation method is different, Sift/surf more complex and relatively stable, the Haar method is relatively simple, biased towards a statistical approach to form characteristics, which also makes it has a certain degree of fuzzy elasticity Generalized Hough transformation is a global feature-contour gradient, but can also be seen as the entire contour of each point position and gradient are characteristics, each point of recognition has contributed to, with a visual vote, see how many votes to determine whether to identify the object.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More