Orb-slam (vi) loopback detection

Source: Internet
Author: User

As mentioned in the previous article, whether in Monocular, binocular or RGBD, the tracking of the pose is error. As the path continues to extend, the error of the front frame is passed to the back, causing the error of the last frame's position to be very large in the world coordinate system. In addition to using the optimization method to adjust the posture locally and globally, the loopback detection (loop closure) can also be used to optimize the pose.

This thing is like a person walking in a strange city, the beginning can distinguish between the cardinal, but with the small street lanes, have no idea where they are. By carefully identifying the surrounding environment, he can set up local map information (local optimization). Recalling the path previously traversed, he can correct some of the previous map information (global optimization). Yet he was still not sure of the precise orientation of his city. Until he saw a place that had passed by before, it dawned on him, "Oh! It turns out I'm back in this place. "At this point, pass this information back to the entire map, and you'll get quite accurate map information. This is loopback detection.

Therefore, loopback detection is a very useful method in large scale map building. Loopback detection can be started from two-dimensional images or from three-dimensional point clouds. At present, we recommend a method based on two-dimensional image.

DBoW2

The method based on two-dimensional image is essentially a problem of scene recognition. I did not delve into it, so I'm just about to introduce the DBoW2 method used in Orb-slam.

BoW (bag of words, the word bag model), can be understood as a dictionary with feature descriptions as elements. If it is an orb feature, it is the Orb dictionary; if it is a sift feature, it is the SIFT dictionary. Dictionaries can be trained from the image data set. Here is a simple example. Suppose we have a data set of 10,000 images and think that it basically covers the scenario we're facing.

    1. Feature points and feature descriptions are extracted from each image, and feature descriptions are generally a multidimensional vector, so the distances between two feature descriptions can be calculated.

    2. These feature descriptions are clustered (for example, K-means), and the number of categories is the number of words in the dictionary, such as 1000; you can also use Beyes, SVM, etc.;

    3. DBoW2 the dictionary into a tree form for easy searching.

In practice, each image searches its nearest neighbor's word in the dictionary and leaves a mark under the word. If both A and B images are positioned to the same word, it is possible that the two images may have similar feature points. When A and B have a certain amount of similarity, it can be considered that there is a certain similarity between the two images.

The author of Orb-slam modified the DBoW2 to output a series of candidate images (candidate) rather than the most similar image.

The bow-based approach has some very good advantages:

    1. Dictionaries can be trained offline. The more things you can take offline in a real-time application, the better. The author provides a dictionary of brief and sift trained by a large amount of data.

    2. Search speed is fast. Small-size images can be done at the millisecond level. The author provides two auxiliary indicators (direct index) and reverse (inverse index). The inverse indicator stores the weight information and image number of the image feature that reaches the node on the node (word), so it can be used to quickly find similar images. The forward indicator stores the feature on each image and the position of its corresponding node on the parent node of the dictionary tree, so it can be used for fast feature point matching (only the words below the parent node need to be matched).

    3. Many slam applications themselves need to compute feature points and descriptions, so they can be searched using features.

    4. Orb-slam's authors also use dictionary features for Fast feature screening, reducing the time required for feature matching (especially when searching for features on large scales).

Of course it also has its own disadvantages:

    1. If the scene you are applying is special, Train your own dictionary. The general dictionary will not be very useful.

    2. Bow generally do not consider the relationship between features (someone is doing it, but the effect and the amount of computation are unclear)

There are also several comments:

    1. If the application itself does not need to calculate the characteristics, consider the additional calculation time.

    2. Recommended like Orb-slam, bow is only used to quickly filter images, subsequent needs to be verified by other methods one by one, rigorous verification. If the loopback is chosen wrong, then kneel.

    3. If the scene features few, or repeated features too much, it is difficult.

    4. In dense point cloud reconstruction, if the scene itself has a rich geometric texture, then the three-dimensional point cloud matching between two frames (including adjacent positions) can be used to verify the loopback. If the matching error is small enough, the loopback is more accurate.

Loopback Verification and SIM3 optimization

For each of the candidate loopback frames, the author first matches the feature points on the current frame and then uses the three-dimensional points corresponding to the feature points to solve a similar transformation matrix (RANSAC frame). If a loopback frame corresponds to a matrix with enough inner points, then do Sim3 optimization. Use the optimization results to find more feature matching, and then do the optimization again. If the inner point is enough, accept the loopback.

SIM3 optimization is described in the previous article.

Loopback Fusion (Fusion)

There is an implicit hypothesis that the error accumulates over time, and we trust the previous information rather than the current information. This part mainly is the information of the loopback frame is fused into the current frame, including the matching feature points corresponding to the three-dimensional information (depth, scale, etc.), the position of the world coordinate system (through the results of Sim3 transformation past) and so on. Fusion also includes the neighborhood of the loopback frame and the neighborhood of the current frame.

Global optimization

When global optimization, fixed loopback frames and their neighbors, the current frame and its neighbors, optimize the remaining frame in the world coordinate posture. See the previous article.

Notice :

This series has been written here for the time being. About tracking that part of the writing is not satisfied, in the future to try to rewrite.

Next, would like to write about the LSD article, it is estimated that a pause, first write one or two articles about grid generation.

Orb-slam (vi) loopback detection

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.