Foreword Summary (abstract)
In this paper, the segment matching algorithm takes advantage of the local similarity and geometric properties of the segment. The algorithm has the following advantages: (1) propose a multi-scale line extraction strategy to improve the robustness of image transformation. (2) Design LBD descriptors, improve the speed of calculating the surface similarity of line segments and reduce the dimension of graph matching. (3) A pair of geometric consistency evaluation, improve the image in the low-structured matching accuracy.
Introduction (Introduction) First, segment extraction and matching (line detection and description) 1.1, the scale space to extract the segment (detecting lines in the scales spaces)
In order to overcome the fragmentation problem in segment extraction (fragmentation problem) and improve the performance of the algorithm under the large scale transformation (large), We perform a series of scale-down sampling (down-sampling) and Gaussian blur (Guassian blurring) on the original image to form a scale space pyramid (Scale-space pyramid) with n images. First, we use the edline algorithm to produce a set of straight lines for each layer of the image, and the direction of each line is composed of the gradient of the edge pixels from the left to the right. We then reorganize the lines according to the correspondence of the lines in the scale space, assigning a unique ID to each line. We store a straight line in the scale space with the same event relationship (that is, the same area of the image with the same direction) into a vector called linevecs. Finally, the result of the segment extraction is a set of vectors shown in 1. The difference between the segment extraction algorithm and Wang et al. algorithm in this paper is that by re-organizing the segments extracted from different scale spaces to form a set of linevecs vectors, the method can effectively reduce the dimension of graph matching problem (dimension).
As shown in 1, each of the Linevec vectors includes at least one segment in the scale space. In order to describe the local appearance of the Linevec vector, we need to produce a segment descriptor for each line segment.
1.2. Stripe indicates segment support domain (the band representation of the line supports region)
For a segment in a given image, the segment descriptor can be computed by the Segment support field (LSR, which contains the local rectangular region of the segment). As shown in 2, Segment support fields are divided into a series of parallel bands (band) {B1; B2; . . . ; Bm}, the number of bands is M, the width of each stripe is w, and the length is equal to the length of the segment. The m=5,w=3 in Example 2.
Similar to the MSLD algorithm, in order to distinguish parallel segments with opposite directions and to make the segment descriptors have rotational invariant properties (rotation invariant), this paper introduces two directions of DL and d⊥ to form a local two-dimensional coordinate system. Where the line direction DL is defined as the direction of the segment (that is, the direction of the gradient), the vertical direction d⊥ is defined as the clockwise (clockwise) vertical direction of the line direction DL (orthogoal direction). The gradient of point pixels in the LSR area can be projected in a local two-dimensional coordinate system, where g is the gradient of the pixel point and G ' is the projection pixel in the local coordinate system.
Inspired by the SIFT and MSLD algorithms, two Gaussian functions (global Gaussian function and local Gaussian function) are applied to each line of the LSR region. First, the section I of the LSR region applies the global weighting factor: where di is the distance from row I to center row in the LSR region, and. Then, considering the strip BJ, the local weighting coefficients are applied to the strip BJ and the K line of the adjacent strips Bj-1 and bj+1:, where DK ' is the distance from line K to strip BJ Center line, and. The purpose of the global Gaussian function is to reduce the weight of the LSR region away from the center line along the d⊥ direction. The purpose of the local Gaussian function is to reduce the boundary effect and avoid the abrupt change of the adjacent stripe descriptor.
The band descriptor has two advantages over the sub-region expression: First, it is more robust in the small position of the direction, because most of the image content in the stripe can remain unchanged, with only a small amount of variation in the stripe boundary. Note that this is a very important feature because the position accuracy of the DL direction is lower than the d⊥ direction due to the instability of the segment endpoint. Second, it is more computationally efficient because each stripe does not overlap in the DL direction, and the Gaussian weights directly act on each row, not each pixel.
1.3. The structure of the stripe descriptor (the construction of the line band descriptor)
For the strip BJ of the LSR area, its descriptors are computed by the adjacent bands Bj-1 and Bj+1. In particular, for the top and bottom strips of B1 and BM, the calculation of their description at midnight does not take into account the parts outside the LSR area, which need to be calculated separately. Therefore, the LBD descriptor consists of the description sub {BDj} of each stripe: lbd= (BD1T,BD2T,....,BDMT) T.
Now we construct the stripe description sub-BDJ, for the K-line of the strip BJ, we accumulate the gradient value of the row pixels, calculated by the following formula:
wherein the Gaussian coefficients.
With four cumulative gradients for all rows in the stripe BJ, the stripe description matrix BDM can be constructed as follows:
where n is the number of rows needed to calculate the stripe BJ Descriptor:
The stripe descriptor BDj consists of the mean vector MJ and the standard deviation vector sj consisting of the stripe descriptor sub-matrix BDMJ: bdj= (MJT,SJT) T∈r8, and the segment stripe description sub-lbd= (M1T,S1T,M2T,S2T,....,MMT,SMT) t∈r8m.
The mean part and the standard variance part of LBD are normalized by their different sizes. In addition, to reduce the effects of nonlinear illumination changes, the LBD of each dimension is suppressed, making it less than a threshold value (experience: 0.4 is a good value). Finally, we re-standardize the constraint vectors to get the LBD of the units.
B. Graph matching based on spectral technology (graph matching using spectral technique)2.1. Generate candidate matching pairs (generation the candidate matching pairs)
If the linvecs of the reference image (reference image) and the query image are detected, if it is not possible to pass the unary geometry attribute (unary geometric attributes) and the local surface similarity (local Apperance similarity) test, it is considered to be mismatched.
Unary geometry properties: The unary geometry attribute in this article refers to the direction of the Linevec, the same Linevec segments have the same orientation, and each linevec has a unique orientation. Because the image pair has a rotation of any angle, the corresponding linevecs in the image pair are usually fuzzy and unreliable (ambiguous and unreliable). Therefore, the number of candidate matching pairs is reduced by estimating the approximate global rotation angle of the image pair.
First, we calculate the histogram (linevecs) of the orientation histogram (histograms) of the image pair (reference image and query image), and the normalization (normalized) processing gets the histograms (HR; HQ), subscript (subscript) R for (donate) reference image, subscript Q for query image. then, the angle θ (0-2π) changes the HQ to search for approximate global rotation angle θg, which is obtained by the following formula:. In practice, if the histogram distance is small, it can be considered that the transformation relationship of the image pair can be approximated by rotation. As shown in 3, the approximate global rotation angle of the estimate is: 0.349 rad; The offset histogram distance is: 0.243. In addition, the histogram-based approach may fail if the lines extracted in the image are repetitive, which means that the wrong rotation angle may be accepted by the algorithm. In order to improve the robustness of this method, the lengths of the histogram falling into the same bins line are summed up. Thus, corresponding to the directional histogram, there is a length vector whose element i is the cumulative length of all lines falling in the I bin of the directional histogram. In our experiment, when the minimum offset histogram distance is less than the threshold value (0.4), and the minimum offset length vector distance is less than the threshold (1), we accept the estimated global rotation angle. Once accepted, a pair of linevecs will be matched, if (PI/4), is the angle between their directions, they are considered to be mismatched. If there are no acceptable rotation angles between the two images, only the appearance similarity is tested.
Similarity of local appearance:
2.2. Resume Relationship diagram (building the relational graph)2.3. Generate final match result (generating the final matching results)Iii. Description of sub-performance test
First, we analyze the effects of the parameters of the LSR region on the characterization of the sub-bands, such as the number of strips m and the width W of the strip, and then compare the performance of the LBD and MSLD algorithms under evaluation. We use the Mikolajczyk and Schmid DataSet (dataset) to evaluate the performance of the segment descriptor, which includes eight different transformations of the image: the illumination transformation (illumination changes), the plane rotation (in-plane rotation), JPEG image compression (JPEG compression), image blur (images blurring), occlusion (image occlusion), viewpoint transformation under low texture (low-texture scene), Texture scenes and scale transformations (scales variation). Each group includes six images from small to large transformations, and the a,c,d in Figure 5 is the data set, and the others come from actual shooting to ensure that the image contains segment features. To better evaluate the performance of descriptors under different image transformations, we extract segments from the original image (original image) rather than grayscale images (octave image).
For this part of the descriptor matching performance, we use the nearest neighbor matching criterion (nearest neighbor matching criterion), is to match the line segment according to the distance of the descriptor so as to avoid the deviation of the distance threshold (prejudice) from the different descriptor preference for different threshold values. Another advantage of this matching criterion is that the recall rate (the ratio of the number of correct matches for the recall ratio and the number of all true counterparts) and the accuracy (the ratio of the number of matching precision correctly matched to the number of all matches) is determined by the number of correct matches, Because the denominator of the different descriptors is the same.
3.1. Dimension of description (the descriptor dimension)
We analyze the effect of LSR parameters on the performance of descriptors, change the band number m and the band width W range from 3 to 13.6, and the number of correct matches in the image is affected by these two parameters. We can see: whether the LBD or the MSLD algorithm, first with the number of M or W, the performance of the algorithm will increase rapidly. Then at M=9 and w=7, 9 o'clock, the performance of the algorithm reaches a peak, and finally tends to be stable.
We also need to evaluate the time performance of the two descriptors, select a 900x600 image, extract 573 straight segments, and the results are shown in table 1. As indicated in table 1, the elapsed time of the running algorithm increases correspondingly with the increase of M and W. The running time of the LBD algorithm is less sensitive than that of the MSLD algorithm in the increase of M, especially in the increase of W. (That means the LBD algorithm takes less time).
Through the above experimental evaluation, the description sub-algorithm test selects the LSR parameter: m=9,w=7. Thus the descriptor is a 72-dimensional descriptor. The LBD algorithm and the MSLD algorithm run for 28ms and 137ms respectively.
3.2. Further comparisons between LBD and MSLD (further comparision of MSLD and LBD)
In this section, we will compare the results of these two algorithms in the DataSet experiment in detail, as shown in the recall rate of LBD and MSLD algorithm 7.
(a) shows the performances of MSLD and LBD for the image illumination changes. From image 1 to image 5, the lighting condition gets worse. The recall ratios decrease with the increment of the lighting distortion.
(b) shows the results for images which is generated by a set of in-plane rotation varying from15°to75°. It is interesting this when the rotation angle is (between Image 3 and the reference image), LBD and MSLD perform worst Because of the aliasing of discrete lines.
(c) and (d) show the descriptor performance against the image compression and the image blurring, respectively. Not surprisingly, the performances decrease with the increment of the image compression Ratioor the image blurring.
(e) shows the descriptor performance against image occlusion (occlusion). To evaluate the occlusion effect, we first artificially add some vertical lines (vertical) features in a background image, then s Hift the region of interest along the vertical direction of the artificial image to generate a set of smaller images as sh Own in Fig. 5 (E). This process makes sure. The most of the lines, their LSR in the image Sequencewill change gradually (some part of The LSR moves out or in). The results show, the descriptor performance decreases with the increment of the image occlusion.
(f) shows the descriptor performance in the lowtexture scene. Images in this sequence is captured in front of the windows with small view point changes. The results does not show drastic change (great changes) in performance because of the small baseline between.
(g) shows the descriptor performance against large view point change. The view angles between the query images and the reference image range approximately from
-70°to 60°. No doubt, the descriptors perform better when the absolute value of the view angle is smaller (image3 and image4).
(h) shows the most challenging case for the descriptors, i.e, and the large scale change. The scale ratio between the query images and the reference image range is from 0.9 to 0.3. The performance decreases fast with the.
four, line matching performance test
Thesis: lbd-Segment Description Sub-algorithm (draft)