Framework analysis of Visual Slam algorithm (3) SVO

Last Update:2018-02-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

　　SVO (semi-direct visual odometry) [1] is, by definition, a set of visual odometer (VO) algorithms. Compared to Orb-slam, it eliminates the function of loopback detection and relocation, does not seek to establish, maintain a global map, more value tracking effect, the pursuit of high computational speed, low CPU occupancy, so SVO is ideal for use on mobile devices with limited computing resources. SVO's improvement on Ptam is mainly in two aspects: 1) efficient feature matching, 2) robust depth filter. The main reason that SVO is much faster than Ptam and Orb-slam is that SVO does not need to extract feature points from each frame. In the tracking thread, the feature point of the current frame is passed from the previous frame with the light flow method, and only the feature points need to be extracted when the mapping thread inserts a new keyframe. Another reason is that SVO uses a depth filter, Ptam and Orb-slam only two frames of image triangulation out of the map point, as long as the map point is not determined to be outside the point, it is fixed (unless the BA stage Adjustment), and the depth filter of SVO will be based on multi-frame image constantly converge map point uncertainty, To get a more reliable map point. Because map points are more reliable, SVO only needs to maintain fewer map points (Ptam generally maintains about 160 to 220 feature points, and SVO maintains approximately 120 map points in fast mode), which speeds up the calculation.

In tracking threading, SVO combines the advantages of direct method and feature point method to design a new three-step strategy. The first step (Sparse model-based Image Alignment), uses the Lucas-kanade optical flow method to calculate the position posture of the current frame. The method is: projecting the map point of the previous frame to the current frame (the initial position is the unit array), comparing the difference between the previous frame feature point and the area (patch) pixels near the projection point of the current frame, and solving the position attitude of the current frame by minimizing photometric error. Because this step is coarse, patch selects the size of the 4*4 and does not do affine transformations (affine warp) in order to speed up the calculation. The second step (Feature Alignment), find the exact pixel coordinates of the feature point in the current frame. For each feature point, consider the common view closest to the current frame (the smaller the angle difference between this keyframe and the current frame, the smaller the patch deformation, the more accurate the match), and then use the Lucas-kanade optical flow method to minimize the photometric error. Compared to the first step of the coarse test, this step to select the 8*8 patch, and do affine transformation, you can get sub-pixel-level accuracy. The third step (Pose & Structure refinement), after finding the exact match point in the first two steps, can further optimize the camera posture and map point location by minimizing the re-projection error (This step is not related to patch, map points and projection points are points, cost function is the distance to point to points, this step is divided into three small steps: 1) motion-only BA, map point unchanged, only optimize the current frame position posture. 2) strcture-only BA, the current frame does not move, only optimize the map point location. 3) Local BA, nearby keyframes and visible map points are optimized, and this step is not done in fast mode.

SVO is called semi-direct method (Semi-direct) is because the first two-step estimation strategy as the direct method (Lsd-slam, DSO as the representative) to minimize photometric error, and the third step and the feature point method (Ptam, Orb-slam as the representative) minimizes the re-projection error. If you omit the first step, it will be more time-consuming to start the calculation directly from the second step, because the patch must be set very large in order to match the long-distance feature point, and the outer point will need to be removed. If you omit the second and third steps, you have a serious cumulative drift, because the first step is to consider the front and back frames, and the third step aligns the current frame with the keyframe and the map point.

In the mapping thread, SVO first determines whether the current frame is a keyframe (inserts a new keyframe if the number of feature points that have been successfully traced is less than a threshold), and if so, extracts the feature points, initializes the depth filter, or, if not, updates the depth filter to see if it converges, if convergent, Generates a new map point auxiliary tracking thread calculation. SVO extracts feature points in the same way as Orb-slam, and also constructs the pyramids first, then divides the mesh to extract the most significant fast feature (if not enough significant fast corner points, Svo2.0[2] will find the highest gradient value of the pixel as the Edgelet feature point, The only difference between it and the normal fast corner point in the calculation is that Edgelet is optimized only in the gradient direction in the second step of the tracking thread feature alignment stage. Each feature point has a separate depth filter, the depth value is set to the weighted sum of the Gaussian distribution and the uniform distribution (the author argues that this hypothesis is better than a single Gaussian distribution), the Gaussian distribution describes the distribution of the interior points, and the distribution of the outer points is described evenly. When the depth value filter is initialized, the uncertainty is very large (if the stereo or RGBD camera can directly give the initial value close to the truth, the depth filter will converge faster), and then each new frame of the position of the attitude, can be based on geometric constraints along the polar line to find feature points matching, Then, based on the triangulation principle, the posterior depth value with less uncertainty is obtained. When the uncertainty is less than a threshold, the map point is created. Map points are immediately used to estimate motion.

Reference documents:

[1] Forster C, Pizzoli M, Scaramuzza D. svo:fast semi-direct monocular Visual odometry[c]//IEEE International Conference On robotics and Automation. IEEE, 2014:15-22.

[2] Forster C, Zhang Z, Gassner M, et al. svo:semidirect Visual odometry for Monocular and Multicamera Systems[j]. IEEE Transactions on robotics, 2017, 33 (2): 249-265.

[2] George Vogiatzis, Carlos Hernández. video-based, real-time Multi-View stereo☆,☆☆[j]. Image & Vision Computing, 2011, 29 (7): 434-441.

Framework analysis of Visual Slam algorithm (3) SVO

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More