Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Orb-slam[1] completely inherited the mantle of Ptam (http://www.cnblogs.com/zonghaochen/p/8442699.html) and made two great improvements: 1) real-time loopback detection; 2) very robust repositioning. To achieve these two improvements, Orb-slam the Ptam mapping line base to local BA and global BA to split into local mapping and loop closing two threads, replacing patch matching with ORB descriptors. and designed a very good map management strategy.
The tracking thread, Orb-slam and Ptam, is also the first to construct the pyramid (the default layer 8) and then extract the feature points (a general technique is to divide the image into a grid, different regions can have different fast thresholds, so as to make the extracted feature points scattered in the image of each region). The difference is that Orb-slam extracts the ORB descriptors on the basis of fast feature points, which have robust invariance under different observation and illumination conditions, and the computational speed is much faster than that of sift and surf. The use of descriptors is to match dbow[2] for feature point matching and loopback detection. Specifically, each descriptor corresponds to a word in the dictionary, and the words in the dictionary are stored in a tree-like structure, with each word corresponding to a leaf node. This tree structure has two kinds of retrieval methods, one is to look up the words from the picture (the words of each picture, and the specific number of the feature points in the picture of the word), and the other is to check the pictures from the words (each word in which pictures are observed, as well as the weight of the word in this image). The current frame position estimation method and Ptam are almost identical, as well as projecting the map point of the previous frame to the current frame (estimating an initial position based on the uniform Motion model), and then finding a match (orb match instead of a patch match for Ptam) to find enough matches to optimize the solution. The track Local map projects more nearby map points to the current frame (the previous step involves only the map points of the previous frame). This is also approximate ptam from coarse to fine two-wheel solution process, rough test optimization results as the initial value of the precision measurement. In determining whether the current frame is a keyframe, the Orb-slam is relatively loose (on the basis of several small conditions similar to the Ptam, the current frame has fewer matches than 90% of the keyframe to consider inserting a new keyframe; in the 2016 orb-slam2[3], According to the distance of 40 times times baseline, the map point is divided into near point and far Point, the near point contributes to estimating scale, translation and rotation, and far point only contributes to estimating rotation. So when the number of near points is less than one threshold, a new keyframe is inserted as well, because the denser the keyframe, the less likely it is to track the failure. The problem with this is that there are redundant keyframes, so in the local mapping thread, the extra keyframes are deleted to control the complexity of the BA.
in the local mapping thread, the first thing to do after inserting a new keyframe is to update the covisibility graph and spanning tree. Covisibility graph is used to describe how many different key frames can see the same map points: Each keyframe is a node, and if the number of co-view map points between two keyframes is greater than 15, then the edges are established between the two nodes, and the weight of the edges is the amount of the shared-view map point. The Spanning tree is a subset of covisibility graph, preserving all nodes (or keyframes), but each node retains only the edges between the map point keyframes and the most common view. The word bag (bags of words) that calculates the new keyframe is then described, that is, the search for "from picture to word" and "from the word illustration" in the previous paragraph, which is to triangulate a new map point in order to match the feature point, on the other hand, for loopback detection. The recent mappoints culling is a newly generated map point that examines the first three keyframes (the newly created map points are inspected for three new keyframes), and if the map points are not inspected (only by a few images), they are deleted. The measured map points are also deleted if they are observed by less than three keyframes, which usually occurs when redundant keyframes and local ba are removed, which ensures that the map points are accurate and not redundant. New Points creation is a match for the feature points on the new keyframe that do not have a match on the map point, and if a match can be found from other keyframes (10 keyframes with the most common-view map points retrieved from the covisibility graph), and a series of constraints are met, the map points are triangulated. The Local BA approach is the same as the Ptam, and it is also the projection of the surrounding map points to the surrounding keyframes, minimizing the error of the re-projection. The difference between this and the tracking thread track Local map is that the keyframe (position gesture) and map point (position) are adjusted here, and the tracking thread is the position gesture that adjusts the current frame. At the Local KeyFrames culling stage, if 90% of the map points observed by a keyframe can be observed by at least three other keyframes, they are considered redundant and will be deleted.
In the loop closing thread, the word bag description of the new keyframe is compared to the other keyframes, and if the two vectors are similar enough, a loopback occurs. How is it similar enough? Orb-slam the minimum similarity between the new Keyframe and the surrounding keyframe (the covisibility graph is more than 30) as the dynamic threshold, the similarity of other keyframes is only greater than this threshold to be a loopback keyframe (to improve robustness, covisibility The three consecutive keyframes in graph must satisfy this condition. Because orb matching between the new keyframe and the loopback keyframe is possible, the matching relationship is also established between their respective map points, so that the transformation between the two keyframes can be optimized (the Orb-slam of the 2015 is only for single-mesh, there will be a scale drift problem, so the calculation is a similar transformation In the 2016 orb-slam2, if the binocular or RGBD camera is used, the scale is no longer unknown, so the rigid body transformation can be calculated directly. In the Loop fusion phase, the first step is to merge the duplicated map points and fill the covisibility graph with the loopback side. It then adjusts the position of the new keyframe and the surrounding keyframe based on the previously calculated transform of the new keyframe and the loopback keyframe, so that the two ends of the loop are basically aligned. The map point near the loopback keyframe is then projected onto the new keyframe, blending the map points on the match. The position of all keyframes is then optimized according to essential graph (essential graph is a simple version of the covisibility graph, preserving all nodes, the total number of points is more than 100 to establish the edge), and the loopback error is divided over all keyframes. The 2015 Orb-slam that after essential graph optimization, the accuracy was high enough to perform a global BA (while optimizing all keyframes and map points), but Orb-slam2 added a global BA (due to the high complexity of the global BA calculation, In order to not affect the subsequent loopback detection, a new thread is opened to execute the global BA exclusively. After a trace fails, it enters the relocation mode, which is similar to the loopback detection method.
Orb-slam has two problems: 1) The computational complexity is higher, the direct reason is that each frame extracts the descriptors. 2) The actual test, Orb-slam jitter (jitter) than SVO large, personal feeling is because the Orb-slam map point is simple triangulation out, the additional constraints are just to eliminate the outside point without further consideration of the uncertainty of the map point, While the depth filter of SVO takes full advantage of the multi-frame image from the angle of probability distribution, the depth uncertainty converges to the smaller interval to insert the map point.
Reference documents:
[1] mur-artal R, Montiel J M, Tardos J D. Orb-slam:a Versatile and accurate monocular SLAM System[j]. IEEE Transactions on robotics, 2015, 31 (5): 1147-1163.
[2] Gálvez-lópez D, Tardos J D. Bags of binary words for fast place recognition in image sequences[j]. IEEE Transactions on robotics, 2012, 28 (5): 1188-1197.
[3] mur-artal R, Tardós J D. Orb-slam2:an Open-source Slam system for Monocular, stereo, and rgb-d cameras[j]. IEEE Transactions on robotics, 2017, 33 (5): 1255-1262.
Framework analysis of Visual SLAM algorithm (2) Orb-slam