"Feature matching" brisk original translation

Last Update:2016-02-24 Source: Internet

Author: User

Tags benchmark comparable

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original:Stefan Leutenegger, Margarita Chli et al."brisk:binary Robust invariant scalable keypoints"Brisk Reprint Please specify source:http://blog.csdn.net/luoshixian099/article/details/50731801 csdn-do not build a plateau on a floating sandSummary:finding key points efficiently from a picture is always an in-depth study, which forms the basis for many computer vision applications. In this field, the pioneer algorithms sift and surf have shown great performance in various graphics transformations, especially the surf is considered the most effective method in the increasingly updated high-performance method. The brisk algorithm presented in this paper is a new method for detecting, describing and matching key points. The comprehensive evaluation of the squadron brisk in the benchmark database is: adaptive and high-quality performance in low-cost calculation (in some cases, the computational speed is one order of magnitude faster than surf). The key to speed is the application of a new scale space: a fast-based detector, combined with a well-assembled string descriptor, is retrieved from a specially sampled intensity comparison of each key point. 1. Introduction

The decomposition of an image into an area of interest or feature area is a widely used technique in computer vision to reduce the complexity of finding the surface features of the original image. Image representation ﹑ target recognition ﹑ matching ﹑ three-dimensional scene reconstruction and motion tracking all depend on the stable and representative features in the image, this problem has promoted the research and produced a lot of methods.

The ideal feature point detector finds prominent image areas so that they can be repeated to detect feature points even though their angles have changed, and more generally it is robust for all possible image transformations. Similarly, the ideal key point detector captures the most important and unique information content, which is included in the detected highlight area, so that it can be identified if the same structure area is encountered. Also, in satisfying these properties to achieve the desired quality of the key points, the speed of the detector and the descriptor also needs to be optimized to meet the time constraints of the task at hand.

In principle, the most advanced algorithm aims to apply the precision of the experiment to the requirements of the computational speed. Lowe ' 's SIFT algorithm is one of the most widely accepted high-quality options available, with an effective specificity and invariance for a variety of common image transformations, however, this algorithm is cost-efficient to calculate costs. On the other hand, the combination of the fast key point detector and the brief method provides a more real-time application choice for descriptors. However, despite the obvious advantages of speed, the latter method is more acceptable in terms of reliability and robustness because of its minimal image distortion and rotational endurance, especially in plane rotation and scale transformations. As a result, a trial application like Slam needs a probabilistic approach to data correlation to find matching consistency.

The inherent difficulty in extracting the right feature points from the image is to balance two conflicting goals: high-quality descriptors and low computational requirements. That is the purpose of the work, which is to build a new milestone with the brisk approach. Perhaps the most relevant task is to solve this problem: surf has demonstrated robustness and speed implementations, except that the results are obvious, with less computational time brisk getting a more impressive matching effect. In short, this paper presents a new method for obtaining feature points from images, and the main points are as follows: Scale spatial feature point detection: Both image and scale dimensions are identified by using a significant criterion to identify points of interest. In order to improve the computational efficiency, the feature points are detected in the octave layer of the image pyramid and in the middle of the layer. The position and scale of each key point is obtained by fitting two functions in a continuous field. Key detection: A sample pattern consisting of points is located on a proportional concentric circle, and the circle is used at each key point to detect the gray value: in-place processing intensity gradient determines the direction of feature description. Finally, the brisk-oriented sampling mode is used to obtain the paired luminance contrast results, combining the results into binary brisk descriptors.

Brisk key points once generated due to the binary intrinsic characteristics of the descriptors can be very efficient matching. Strongly aware of its computational speed, brisk also uses storage SSE directives to provide extensive support in today's architecture.

2. Related workidentify local areas of interest for image matching in the literature can be traced a long way, by Harris and Stephens proposed a first and most famous corner detection began. Mikolajzyk's creative work at the time proposed a comprehensive evaluation of the most competitive detection methods, which revealed that there is no single universal detector used to detect but depends on the application context of the different methods of complementary quality detection. The most recent fast standard for key detectors has become the most popular of the most advanced methods with hard real-time constraints, extending these efforts with agast to improve performance.

The best feature point extraction method of quality in literature is sift. The high descriptive force and robustness to inspire and view changes make the evaluation of sift descriptors at the highest point in the survey. However, this descriptor makes the high-dimensional extraction of sift very slow. Pca-sift has reduced the descriptors from 128 to 36 dimensions, however, the effect of its particularity and the increase of the time of the descriptor formation almost makes the performance of increasing the matching speed destroyed. It is worth noting that the Gloh descriptor also belongs to the class Sift method family and has proved to be unique but more expensive to calculate than the sift.

The increasing demand for high quality and speed of features has enabled more algorithmic research to process large amounts of data at higher rates. It is worth noting that Agrawal and others use the central symmetric local two value mode as another SIFT direction histogram method. The most recent brief algorithms are super-high-speed descriptors, matching and binary string combinations designed to contain simple image enhancement comparisons at random predetermined pixel locations. Although this method is simple and effective, it is very sensitive to image rotation and scale transformation, which restricts the application of its general function.

Perhaps the most attractive feature point extraction method At present is surf, which has been proven to be significantly faster than fast speed. The surf feature point detection uses the determinant of the Hessian matrix (BLOB detector), and its descriptor summarizes the response of the Harris wavelet to the area of interest. Surf shows the most advanced timing, in terms of speed, the order of magnitude is still far short of the fastest, and currently these limits the quality of its extraction feature points.

This paper presents a novel algorithm called Brisk, which has high performance, fast feature point detection and descriptor, and fast matching. As its name implies, the method has a large degree of rotational invariance and scale invariance, achieves high performance and the most advanced, while greatly reducing the computational cost. The following feature descriptors are the methods that are currently performed in the benchmark database and the use of standardized evaluation methods. The brisk evaluation of surf and Fast is presented, which is widely accepted as a comparison criterion in the field of common image conversion.

3.BRISKin this section, the key steps of brisk are described in detail to describe feature detection ﹑ descriptor composition and key point matching to motivate readers to better understand and innovate. It is important to note the modularity of this method, which allows the use of the brisk detector in conjunction with any other key-point detection operator, and vice versa, to optimize the current algorithm performance and tasks. 3.1-Scale space detection

emphasis on computational efficiency, feature detection method is Mair and other people on the image of the area of interest detection work inspiration. Their agast is essentially an extension of the current popular fats algorithm performance, proving to be an effective basis for feature extraction. Achieving the goal of scale invariance is critical to the high-quality key points, in order to obtain the maximum value, not only in the image plane, but also in the scale space using fast fraction as the accurate measurement. Although it is more difficult to disperse a scale axis on a coarse axis than to select a high-performance detector, the brief detector estimates the true value of each feature point in each successive scale space.

In the framework of brief, the scale space pyramid is composed of n octave layers (denoted by ci) and n intra-octave layers (denoted by di), n=4,i={0,1,..., n-1} in the article. Suppose there is an image Img,octave layer generation: C0 layer is the IMG original image, C1 layer is C0 layer twice times the next sampling, C2 layer is C1 layer twice times under sampling, and so on. Intra-octave layer Generation: The D0 layer is an IMG 1.5 times-fold sampling, the D2 layer is D1 layer twice times the next sampling (that is, IMG 2 * 1.5 times times sampling), D3 layer is twice times the D2 layer sampling, and so on. The scale relation of CI, di layer and original image is denoted by T as:,

It is important to note that both fast and agast provide different options for masking shapes detected by feature points. In brisk, the main use of 9-16 masks, that is, in a 16-pixel circle requires at least 9 consecutive pixels to provide enough light or dark pixels to achieve, compared to the fats standard implementation of the central pixel.

Fig. 1 Fig. 2

At the beginning, the FAST 9-16 detector uses the same threshold t at each octave layer and octave layer to identify potential areas of interest. Next, the points belonging to these areas of interest are non-maxima suppressed in the scale space (non-maximum suppression of the same SIFT algorithm): First, the problem is that the feature point needs to implement the maximum condition within the 8 neighborhood of the fast score value s of each layer. The value S is defined as the maximum threshold for the point in the block area of an image. Second, the score values for each layer and its upper and lower layers should be as low as possible. Check for square neighborhoods of the same size: the most likely maximum is the length of the inner edge of 2 pixels in each layer. Because the adjacent layer (and therefore its fast score) represents a different discretization, some interpolation is performed on its neighborhood boundaries. Figure 1 depicts an example of this process sampling and maxima detection.

The maximum value detection of the scale axis in the octave C0 layer is a special case: in order to get the fast score, because the C0 layer is the lowest level, it is necessary to have a virtual d-1 layer, C0 on fast5-8. However, in this case, the score of the d-1 layer patch needs to be lower than the feature point of the detected octave C0.

Taking into account the characteristics of the image as a continuous digitizing not only on the image also on the scale dimension, for each detected maximum value of the sub-pixel interpolation and continuous implementation. To limit the complexity of the refinement process, first, satisfying a two-dimensional least squares sense of two functions makes every 3 score-patches (to obtain the key points of the layer and the upper and lower layers) to obtain a 3-pixel highlight maximum value. To compare duplicate sampling, consider using a 3x3 score patch at each level. Next, these prominent score generate a final score estimate and a scale estimate of its maximum value along the scale axis with a suitable one-dimensional curve. The final step is to repeat the interpolation of the image coordinates at the decisive scale of the layer and adjacent edges. Brisk detection is shown in an example 2 of the near distance in the two images of the boat sequence.

3.2 Feature Point description

Given a set of feature points (including the scale values of subpixel interpolation image positions and associated floating point points), the brisk descriptor is composed of the results of a binary string passing through the indirect simple luminance comparison test. This view has been proven to be very effective, but more qualitative patterns are used here. In brisk, the characteristic direction of each feature point is determined so as to get the descriptor of the direction equalization, so the realization of rotational invariance is the key to the general robustness. At the same time, the emphasis on describing the maximum brightness comparison is carefully selected.

Figure 3

3.2.1 Sampling mode and rotation estimation

The key concept of the brisk descriptor is the use of the pattern used to capture the adjacent positions of the key points. As shown in the pattern 3, a circle of n feature points is collected around the key point, and a plurality of equal local circular regions are defined. This pattern is similar to the Daisy descriptor, and it is important to note that the use of brisk is completely different, Daisy is specifically built for dense matching, deliberately capturing more information, and therefore requires speed and storage requirements.

To avoid aliasing, Gaussian smoothing is applied to the sample point Pi in the pattern, and the standard deviation δil is proportional to the distance of each sample point corresponding to the respective center, and the positioning and expansion modes are modeled in the image accordingly for the key point K, considering an N (N-1)/2 sample pair, expressed as a set (PI,PJ). The smoothed pixel values for these points are I (pi, ōi) and I (Pj, Σj), and the formula for estimating the local gradient value g (pi, Pj) is:

(1)

A collection of all combinations is called a sample point pair, expressed as a collection:

(2)

Defines a short distance point pair subset S, long distance point pair subset L (L) for:

(3)

Where the threshold distance is set to:δmax=9.57t,δmin=13.67t, and T is the scale at which the feature point K resides.

Now use the information above to calculate the main direction of the feature point K (note: The long distance point pair subset is used here), as follows:

Long-distance point pairs are involved in the operation, based on the hypothesis that local gradients cancel each other out, so the calculation of global gradients is unnecessary. This is also confirmed by the experiment of the threshold value of the distance variable δmin.

3.2.2 Create a description child

For the establishment of a description of rotation and scale normalization, the brisk uses a sampling point around the critical point K to rotate the α= ARCTAN2 (GY, GX) angle as a pattern. Similar to brief, the Birsk descriptor is also a vector containing 512 bits, each of which is produced by a short-distance point pair (pαi, pαj) ∈s 22, with superscript alpha representing the mode of rotation. Each of these bits B corresponds to:

Unlike brief, the brief is just a comparison of brightness, except for the preset scale and the pre-rotation of the sample mode, brisk and brief have a fundamental difference: I. Brisk uses a fixed sample mode point, and a uniform sampling of the circle around the key points with R as a radius. Therefore, the specific Gaussian kernel smoothing does not suddenly distort the information of the luminance content (blurring the brightness of the adjacent two sample points, thus guaranteeing a smooth transition of brightness) two. The point comparison with 22 brisk significantly reduces the number of sample points (for example, a single sample point participates in more comparisons), Limits the complexity of the brightness lookup table three. The comparison here is limited by space, so the change in brightness only requires local consistency. For the above sampling mode and distance threshold, a length of 512 bit string is obtained. The descriptor of BIRSK64 is also a vector of 512 bits, so a pair of descriptive sub-matches will be defined at the same speed.

3.3 Descriptor Matching

Matching two brisk descriptors is a simple calculation of their Hamming distance in brief: the number of bits is different from the two descriptors their measurements are different. Note that they each use bit count to reduce the operations of bitwise operations, and in today's architecture they can be very efficient to calculate.

3.4 Implementation Notes

Here, a brief overview of some of the methods implemented is given, which contributes significantly to the overall computational performance and the reproducibility of the method. All brisk functions of the common two-dimensional feature point interface based on OPENCV can be easily integrated and interchanged with existing feature point extraction algorithms (Sift,surf, BRIEF, etc.).

The agast algorithm is used to calculate the significance of the score detection process. Non-maximum suppression benefits from the early termination capability, which limits the significant score to the minimum value. The image pyramid was built using a number of SSE2 and SSE3 instructions, all of which were about 1.5 times times the half sample and the next sample. Two

In order to efficiently retrieve the gray values in the sampling mode, a discrete rotated and scaled brisk schema version of the lookup table is generated, including the location of the sampling point and the properties of the Gaussian smooth kernel and the index of long distance pairing, which consumes approximately 40MB of RAM memory. All of these are still acceptable for applications with limited computational power. In addition, using integral images in a simplified Gaussian kernel version, the kernel is used to motivate the core to be a scalable transformation when σ is changed without any increase in computational complexity. In the final implementation, the floating-point boundary and the edge length are ρ= 2.6 · Σ uses a simple squared-mean filter error estimate. Therefore, instead of wasting time using Gaussian smoothing from many different cores in the entire picture, you can retrieve a single value using an arbitrary parameter σ. This paper also integrates improved SSE Hamming distance measurement to achieve the current OpenCV six times times the speed to achieve matching, such as with brief version of the OpenCV.

4 Experiments

The method proposed in this paper has been extensively tested, and the evaluation method and database are now established in the field of Mikolajczyk and Schmid. In order to make consistent results in other work, this article also uses the online MATLAB evaluation version. Each database contains a sequence of six images, increasing the number of conversions. All the comparisons here are implemented on each database, comparing the first picture. Figure 4 shows a picture that analyzes each data set.

Figure 4

This transformation includes angle changes (graffiti and Wall), scaling and rotation (Boat), filtering (Bikes and Trees), brightness changes (Leuven), and JPEG compression (UBC). Since the angle change scene is planar, compared to the OpenCV2.2 version of the SIFT and OpenCV first editions of Surf, on all sequences of the image pairs provides a ground-true singleton matrix for brisk detection and presentation of descriptors. The evaluation uses a similarity match, which considers that each pair of feature points has a sub-distance that is below a certain threshold match, for example to find the location of the database with the lowest descriptive child distance compared to the nearest neighbor match. Finally, the brisk is shown to be one of the major advantages of the computational speed of comparative timekeeping.

4.1 Brisk detection repeatability

Detection repeatability is defined as the number of corresponding key points and minimum key points calculated in two graphs. The corresponding flag is projected by looking at the overlapping areas of a key area in a graph (that is, extracting a circle) and the key points from other graphs: if the area of the intersection is greater than 50% of the common area, it is considered to be a corresponding flag. This method of labeling is largely dependent on the work of the key-point circular area, the constant factor between the scale and the radius. The average radius obtained with the brisk detector, like this, is roughly matched with the average radius obtained by the surf and sift.

The repeatability evaluation score (as shown in the selected result 5) is implemented in a sequence using a continuous brisk detection threshold. In order to achieve a comparable comparison with the surf detector, it is chosen to use the respective Hessian thresholds to make the same number of corresponding flags as the approximate output at similar basic matching steps.

As shown in 5, the brisk detector shows the same repeatability as the surf as long as the applied image conversion is not very large. However, given the obvious advantages of brisk than surf in computational speed, the brisk method proposed in this paper has become a powerful competitor, even though it seems to be lesser in large scale transformations.

Figure 5

4.2 Overall evaluation and comparison brisk algorithm

Because the purpose of the task is to provide an overall and fast robust detector, descriptor and match, the common performance of all the steps in brisk compared to sift and surf is evaluated. Figure 6 shows an image pair in which different datasets are selected, and the accuracy-memory curve uses a similar threshold-based match. This evaluation is also adapted to the detection threshold, which outputs a number that is roughly equal to the corresponding mark in the sense of fairness. Note that the results here are different from [3], where all the descriptors are extracted in the same area (obtained by Fast-hessian).

As shown in 6, brisk has a more competitive advantage over all datasets than sift and surf, and even in some cases far outweighs the other two. Brisk performance on tree datasets is due to the performance of the detector: when the surf detects 2606 and 2624 image regions, brisk detects only 2004 image areas in Image 4, and achieves approximately equal numbers of corresponding flags compared to the 1 image regions found in picture 5,949. The same approach applies to other filtered datasets, where the prominence of the bikes:fast is inherently more sensitive than the other class blob detectors. Therefore, the evaluation of brisk descriptors is extracted from the dataset of the tree in the surf region.

Figure 6

Obviously, the performance of sift in figure trees, Boat, and UBC datasets is significantly deteriorating, which can explain the reason for the limited repeatability of the detector in these cases. On the other hand, sift and brisk are very good at handling pure plane rotations, better than surf.

In order to complete this part of the experiment, the brief algorithm is contacted. Figure 7 shows a comparison of single-scale brisk versions (Su-brisk) and 64-byte brief feature points that do not have rotation on the same (single-scale) Agast feature points. Also included are rotational invariance, single-scale brisk (s-brisk), and standard brisk. Two image pairs were used: on the one hand, using the first two pictures, the Su-brisk and BRIEF64 showed very similar performance in the case of no single scale and plane rotation in the figure wall data set. Notice that the real condition of the brief algorithm design. On the other hand, apply a different version to the first two pictures of the boat sequence: This test shows some advantages in terms of robustness for small scale rotations and scale changes su-brisk than brief. Moreover, the well-known and apparent cost of rotation and scale invariance is easily observable.

4.3 time schedule

Time control has been recorded in laptops with a four-core i7 2.67 GHz processor (albeit with only one core) running Ubuntu 10.04 (32-bit), using the detailed implementation and setup process above. Table 1 shows the test results for the first image in the graffit sequence of the figure, and table 2 displays the matching time. The operating average is more than 100 times. Notice that all of the matches have been calculated by the brute force descriptor distance without any prior termination optimizations.

Table 1 Table 2

The timing shows an obvious advantage of the brisk, whose test box descriptors are usually calculated at an order of magnitude faster than the surf speed, which is considered to be the fastest existing feature point matching with rotational invariance and scale invariance. It is also important to emphasize that brisk can be scaled more easily and faster by reducing the number of sample points in a pattern at the cost of matching quality, which is possible under a particular application. In addition, scaling or rotational invariance can be omitted, adding speed and matching quality to unwanted applications.

4.4 Examples

An extensive evaluation of the above offers a true example of using brisk as a match. Figure 8 shows a pair of images showing a variety of transformations. A similarity with a threshold of 90 matches a robust match (more than 512 pairs of feature points) that is worth the absence of an obvious exception.

Figure 75 Conclusion

This paper presents a novel method called Brisk, which solves the problem of classical computer vision detection, and does not have enough prior knowledge to describe and match the key points of image in the field and the photograph position. In contrast, this approach provides a dramatic and faster alternative to comparable matching performance compared to the established and proven high-performance algorithms such as SIFT and surf, based on a widely-evaluated statement that uses the established framework. The brisk relies on a simple configurable circular sampling pattern, which computes the contrast of brightness from the binary descriptor string it forms. Brisk's unique features can be used in a wide range of applications, especially for hard real-time constraints or limited computational power. Finally, brisk provides high-end feature point quality in time-consuming applications.

On the road to further research on brisk, the goal is to explore the ability to replace the maximum value of the scale space to generate better repeatability while maintaining speed. In addition, the objective is to analyze theoretical and experimental brisk patterns and configuration comparisons, so that the information content or robustness of such descriptors is maximized.

"Feature matching" brisk original translation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More