Distinctive image features from scale-invariant keypoints (Personal translation + notes)-Introduction

Source: Internet
Author: User

Distinctive image features from scale-invariant keypoints. This paper is the most classic sift algorithm in the Image Recognition field. The first task assigned by the instructor is it. If you have found many Chinese translations on the Internet, you can try your best. By the way, it will benefit future generations and take the time to translate and take notes here.

Bytes --------------------------------------------------------------------------------------------------------

Distinctive image features from scale-invariant keypoints

Unique Key Points of Scale-independent image features

Abstract
Summary

This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. the features are invariant to image scale and rotation, and are shown to provide robust matching between ss a substantial (enriched, powerful) range of affine (affine, ry) distortion (distortion, deformation), change in 3D viewpoint, addition of noise, and change in illumination. the features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from your images. this paper also describes an approach to using these features for object recognition. the recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally precise Ming verification through least-squares solution for consistent pose parameters. this approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
This article shows a unique feature extraction method from an image, which can be used to match an object or a landscape from different perspectives. These features remain unchanged for image scaling and rotation, and demonstrate robust matching for geometric distortion, 3D Angle of View transformation, increased noise, and illumination changes. These features are unique. In a scenario, a single feature can be correctly matched in a large number of databases with many images. This article also provides a method to use these features for object recognition. This recognition uses the fast nearest-neighbor algorithm method in the database of known objects) match independent features. Then, the class owner of a single object is identified by using the Hough transformation. Finally, the attributes of the consistent posture are authenticated by least-squares solution. This method can be used to recognize objects between clustering and occlusion in near real time.

Note: The sift method can effectively match images with different angles of light and noise, this match is a match between a graph and a bunch of images. At the same time, this article provides a method to match features by using the rapid neighborhood method, use the HOF transform to perform image matching for these classes, and then use the least variance method.


1. Introduction
1. Introduction

Image matching is a fundamental aspect of specified problems in computer vision, including object or scene recognition, solving for 3D structure from multiple images, stereo corresponsibility, and motion tracking. this paper describes image features that have has properties that make them suitable for matching differing images of an object or scene. the features are invariant to image scaling and rotation, and partially invariant to change in illumination and 3D camera viewpoint. they are well localized in both the spatial and frequency domains, which reduces the probability of disruption by occlusion, clutter, or noise. large numbers of features can be extracted from typical images with efficient algorithms. in addition, the features are highly distinctive, which allows a single feature to be correctly matched with high probability against a large database of features, providing a basis for object and Scene recognition.
Image matching is a fundamental problem in computer vision. Including object recognition, Scene recognition, 3D structure calculation from multiple images, three-dimensional correspondence, and Action tracking. This article describes the many attributes of image features that make them more suitable for matching objects or scenery from different images. These features remain unchanged for image scaling and rotation. The illumination and 3D camera are not changed. They can be well located in the frequency and spatial fields, and emit possible interference from illumination, clustering, or noise. A large number of features can be extracted from typical images by using appropriate algorithms. In addition, these features are highly distinctive. Allows a single feature to match an image with a high degree of accuracy in a database with a large number of features. It provides the basis for Object Recognition and Scene recognition.


The cost of extracting these features is minimized by taking a cascade filtering approach, in which the more expensive operations are applied only at locations that pass an initial test. following are the major stages of computation used to generate the set of image features:
Cascade filtering convolution filter ?) The overhead of feature extraction can be minimized, and the maximum overhead operation is only performed during locating and initial testing. The main stages of generating image features are as follows:


1. scale-space extrema Detection: the first stage of computation searches over all scales and image locations. it is implemented efficiently by using a difference-of-Gaussian function to identify potential interest points that are invariant to scale and orientation.
1. Scale Space extreme value detection: the first step is to search for all scales and image locations. Differential Gaussian operations are used to identify potential points of interest with unchanged scales and directions, which can make the operation faster.

 

2. keypoint Localization: at each candidate location, a detailed model is fit to determine location and scale. keypoints are selected based on measures of their stability.
2. Key Point Positioning: for each candidate point, a detailed model must adapt to the determined position and scale, and determine the key points based on the measurement stability.

 

3. orientation assignment: one or more orientations are assigned to each keypoint location based on local image gradient directions ctions. all future operations are saved med on image data that has been transformed relative to the assigned orientation, scale, and location for each feature, thereby providing invariance to these transformations.
3. direction allocation. One or more directions are specified for each key point. Based on the partial image gradient indication, all subsequent operations on image data for each feature are converted to the specified direction, scale, and position. This provides immutability for these transformations.


4. keypoint descriptor: The local image gradients are measured at the selected scale in the region around each keypoint. these are transformed into a representation that allows for significant levels of local shape distortion and change in illumination.
4. Key Aspect Description: the gradient of the local image is measured for the area around each key aspect on the selected scale. They are all converted to a form that allows local deformation of features and changes in light.


This approach has been named the Scale Invariant Feature Transform (SIFT), as it transforms image data into scale-invariant coordinates relative to local eatures.
This method is named sift, because it converts an image into a local feature of a scale-unchanged coordinate pair.


An important aspect of this approach is that it generates large numbers of features that densely cover the image over the full range of scales and locations. A typical image of size 500x500 pixels will give rise to about 2000 stable features (although this number depends on both image content and choices for various parameters ). the quantity of features is special important for Object Recognition, where the ability to detect small objects in cluttered backgrounds requires that at least 3 features must matched from each object for reliable identification.
An important aspect of this method is that this method can generate a large number of features to cover the full scale and location. A typical 2000 X pixel image will produce approximately stable features (although this number depends on the image content and the selected attributes ). The amount of these features is particularly important for object recognition. When detecting small objects from a messy background, to obtain credible identification, at least three features must be correctly matched,


For Image Matching and recognition, SIFT features are first extracted from a set of reference images and stored in a database. A new image is matched by individually comparing each feature from the new image to this previous database and finding candidate matching features based on Euclidean distance of their feature vectors. this paper will discuss fast nearest-neighbor algorithms that can perform this computation rapidly against large databases.
For Image Matching and recognition, sift is the first feature that is extracted from a set of related images and stored in the database. A new image is compared with a single feature in the previous database. Each feature is matched, and candidate matching features are identified based on the Euclidean distance between feature vectors. This article will discuss how quick neighborhood makes computing faster when facing large databases.


The keypoint descriptors are highly distinctive, which allows a single feature to find its correct match with good probability in a large database of features. however, in a cluttered 2 image, then features from the background will not have any correct match in the database, giving rise to keep false matches in addition to the correct ones. the correct matches can be filtered from the full set of matches by identifying subsets of keypoints that agree on the object and its location, scale, and orientation in the new image. the probability that several features will agree on these parameters by chance is much lower than the probability that any individual feature match will be in error. the determination of these consistent clusters can be saved med rapidly by using an efficient Hash Table Implementation of the Generalized HOUGH transform.
The description of the key point is highly distinctive, allowing it to use a feature in a large database to find the correct match in the database at a higher level. However, in the two clustered images, many background features cannot match the database well, and many incorrect matches will be added to the correct match. Correct matching: the method used to identify the position, scale, and direction of the object corresponding to the subset of key points can be filtered out from all matching sets. Such occasional Matching errors between features and attributes are much lower than Matching errors of a single point. It is determined that these consistent clustering can be quickly expressed by using the hash table implementation implemented by the effective Hof transformation.


Each cluster of 3 or more features that agree on an object and its pose is then subject to further detailed verification. first, a least-squared estimate is made for an affine approximation to the object pose. any other image features consistent with this pose are identified, and outliers are discarded. finally, a detailed computation is made of the probability that a participant set of features indicates the presence of an object, given the accuracy of fit and number of probable false matches. object matches that pass all these tests can be identified as correct with high confidence.
For an object corresponding to three or more features of each clustering, its posture is subject to more in-depth and detailed verification. First, the estimation of the least square is to use an affine approximation to approximate the posture of an object. If the characteristics of other images remain unchanged, when the posture is identified, the exception is discarded and eventually, A detailed calculation is composed of a specific set of features, representing the existence of an object. Given the matching accuracy and possible incorrect matching. After these tests, object matching can be identified with sufficient confidence.

Distinctive image features from scale-invariant keypoints (Personal translation + notes)-Introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.