Author: jangchul, Ph. D., Department of Robotics, New York State University, Tsinghua University, automation. Research and Development Direction: Map reconstruction, location tracking, robot autonomous obstacle avoidance navigation, device-side and cloud algorithm optimization. 2009-2014 at Microsoft, 2014-2016 at the beginning of the Magic Leap work.
Http://weibo.com/ttarticle/p/show?id=2309403994589869514382&mod=zwenzhang
Slam technology has been known for its popularity with robots, VR and AR in recent years, and has made different progress in sensors, algorithms, software and hardware. This paper briefly explains the definition and classification of slam, analyzes in detail the slam of the current applications of VR, AR, Robot and so on, probes into some engineering details of realizing slam in practical application, and forecasts the future of slam just beginning. What is slam
What slam is. According to Wikipedia's introduction: "Simultaneous Localization and Mapping (SLAM) is the computational problem of constructing or updating a MA P of a unknown environment while simultaneously keeping track to an agent ' s location within it. " The simplest and most essential understanding, Slam refers to the fact that when a device (such as a robot, a VR device, etc.) arrives in a completely unfamiliar environment, it needs to accurately establish the time and space correspondence and to answer the following questions perfectly: where I am and where I am now. What I see, what I see now and what I saw before. What I used to walk on the track. What is the world I see now, and how it has changed compared to the past. My trajectory is shaking, my position is drifting. Can I keep track of my tracks and what should I do if I lose them? Is the knowledge of the world that I built in the past still useful? Can I quickly position myself in the abstract of a world where I am now? Main classification of Slam
Robotic SLAM
From the earliest stage of military use to the later application of robotics, the industry has further research on slam. Robotic slam mainly includes Kalman filter and particle filter. Kalman filter in many engineering fields have applications, the most early for the robot Kalman filter, the default system is linear and with Gaussian distribution of noise, the classic Kalman filter can directly give the optimal solution, but the reality is much more complex than this, so there are many variants of Kalman filter. If it is not a linear system or noise is not a Gaussian distribution, then the particle filter algorithm generates a lot of particles, and each particle is a possibility of the state of the model, and then according to the observation and update of the state of the particle swarm convergence results. Of course, the particle filter also has practical problems, such as the classical particle decay problem (particle depletion), and engineering how to control the accuracy and convergence rate of a good balance of the problem.
Ptam
Ptam (Parallel Tracking and Mapping) architecture is more of a system design, attitude tracking (tracking) and building maps (Mapping) Two threads are parallel, which is essentially a multithreaded design for slam. Ptam in the current Slam field appears to be pediatrics, but at that time is a pioneering, for the first time to make people feel that the optimization of maps can be integrated into real-time computing, and the entire system can run. In particular, the attitude tracking thread does not modify the map, but uses the known map to track quickly, while the mapping thread is focused on building, maintaining and updating the map. Even though creating a map thread takes a little longer, the attitude tracking thread still has a map to track (if the device is still within the completed map). This is one of the benefits of two things in parallel, but the real problem is that if the map is built or optimized too slowly, tracking threads can easily be lost because there are no up-to-date maps or optimized maps. Another practical engineering question is whether the latest map data for map threads should lock or copy data between threads and threading's implementation quality.
Sparse SLAM
Now often said sparse slam from the architecture mainly divided into two major categories: filter based and keyframe based. Here the filter is much more complicated than the earlier robotic SLAM filter, and the most representative is the EKF SLAM, the core idea is the linear approximation to the nonlinear system. The simplest example, if it is a variable, is expressed by the current model value and the derivative, and if multiple variables, then the expression is the Jacobian Matrix. The full scale slam of filter based need to pay attention to the balance of filter state and calculation time, as well as the matrix block updating in the actual engineering implementation (the high dimensional sparse matrix inversion is directly exploded). The core idea of Keyframe based Slam is the concept of key frames (keyframe)--because each map is used to create or update map calculations too much, choose some good keyframes from the image stream to create and update the map-- The map in Ptam is built from key frames. This approach has been widely accepted in the industry. But the extraction of the key frame itself is a great knowledge, along with the local map and global map Maintenance, update and efficiency balance.
Dense SLAM
Dense slam is another major category of Slam. Sparse or dense refers to the sparse or dense extent of the map points. For a simple example, sparse map is calculated by triangulation, with 1000 points in one bedroom enough; but dense map is typically a depth sensor for some active source (depth sensor, such as an ASIC in Intel's RealSense), Assuming that the resolution of each frame of depth sensor is 640x480, and even if there are 2/3 invalid depth, there are still 100,000 3D points, so the sparse map and dense map are usually two less than the order of magnitude. Because dense slam has enough map information, it is ideal for fine 3D reconstruction. And if the dense slam to build map-side tracking from 0 to 1 in real time, all the depth data of each frame's available pixel will be used to build the map and track. Dense slam's representative is the Kinect Fusion, and there have been many variants and evolutions, such as Elasticfusion and dynamicfusion. In recent years, the Slam "mutant."
Dtam-dense Tracking and Mapping
Dtam inherits the framework of key frames, but the processing of key frames is very different from traditional feature point extraction. Compared with the traditional method of sparse feature point extraction for each frame, Dtam direct methods inverse The depth data of each pixel under the premise of the default ambient brightness (brightness consistancy assumption). Depth extraction and continuous optimization to build dense maps and achieve stable position tracking. The difference between Dtam and Ptam is illustrated by the comparison of intuitive numbers: A key frame of Dtam has a depth estimate of 300,000 pixels, while the Ptam general practice is up to 1000. The advantages and disadvantages of Dtam are obvious (see Figure 1): Quasi, stable, but speed is a problem, each pixel is calculated, it is easy to pass the GPU parallel computing, but the power consumption and the difficulty of the product is also increased.
Picture description
Fig. 12 Comparison of key frame processing methods, pictures from Jakob Engel in ICCV 2015 ppt
Semi-dense LSD slam-semi-dense Large Scale Direct SLAM
In particular, the semi of the LSD Slam is the number of pixels, that is to say, only an area with "information", rather than an estimate of every pixel like Dtam, a simpler image (but not 100% academic rigor) that only estimates "textured" places, It is not estimated that the ultimate monster-the "Great White Wall"-is the fear of every slam person. See Figure 2, the upper right is semi-dense, the lower left is sparse approach, and the lower right is dense approach. The part of the estimate depth is also direct method, which is similar to Dtam, and is not described here too much. Speed, semi-dense can be done in real time on a computer that has only a core i7 processor.
Picture description
Figure 2 Semi-dense, the picture is from Semi-dense Visual odometry for a monocular camera,jakob Engel, Ju Rgen Sturm and Daniel Cremers 2013 Text
Vio-visual Inertial Odometry
The biggest difference between the vio and the previous slam is two: first, Vio requires sensor fusion on the hardware, including cameras and six-axis gyroscopes, camera-generated images, and six-axis gyroscopes that generate acceleration and angular velocities. The camera is relatively accurate but relatively slow, the original acceleration of the six-axis gyroscope if the direct integral is taken away in a very short time (zero-drift), but the six-axis gyroscope is very high frequency, on the mobile phone has 200Hz. Second, Vio is a more complex and effective Kalman filtering, such as MSCKF (Multi-state-constraint-kalman-filter), focusing on the rapid attitude tracking, without the effort to maintain the global map, do not do keyframe Global optimization (bundle adjustment) for maps in based slam. The most notable commercialization is Google's Project Tango and flyby Media, which has been acquired by Apple, where the second generation of Project Tango has a tablet computer with Nividia TK1 and a deep camera with an active source. This hardware is a small partner to do the algorithm of the dream partner, specifically here not much elaboration. What kind of slam to use in different applications
The author has always been a point of view is slam technology and other technologies like: On the one hand, technology needs to reach a higher academic or industrial standard from the perspective of research and development, on the other hand, technology itself must fall to the real product, and pure technology to achieve 100 points of course has its significance, But the pursuit of technical or mathematical "beauty" completely disregard the engineering implementation and product requirements are likely to go astray. With so many fine details in the slam today, Microsoft, Apple, Google, Facebook and other big companies have been able to afford core algorithms and software development for years, but the vast majority of small and medium sized companies or large companies that were not previously laid out in this area, In the urgent need for this technology, we have to think about their own product planning and specific needs to make decisions. The following are the applications of slam in several different directions or industries:
VR Products: The essence of VR is to allow users to experience a completely different virtual world through immersive experiences, and slam is the perception and understanding of the real world, if the VR product needs slam, it must be the combination of the virtual world and the real world. At present, in addition to the three major manufacturers (OCULWS, Sony and HTC) have their own "outside-in tracking", most of them do not "outside-in tracking" The VR product of the solution can only track the user's head rotation through the six-axis gyroscope and can not track the user's displacement, but slam can solve six DOF tracking problems. In addition, whether the VR product needs slam map (mapping), what form and what scene needs to be further thought.
AR Products: AR is the essence of virtual elements in the reality of the perfect fusion, compared to Vr,ar products, regardless of algorithm, software complexity or hardware complexity or the difficulty of mass production has increased a lot. For AR, slam is not "better able to have" but "must have". Further, slam as the technology of perceiving the world is only one of the technologies that AR products "must possess", the study and interpretation of the world, the content of display, the quality of optical display, the comfort of hardware and the capacity of hardware, etc., are also required to solve, the need for a large number of human and financial resources and the ability to work overtime.
Robot Products: Bill Gates in the 2007 New Year outlook on the "a Robot in Every home" vision, now nearly 10 years ago, the robot has been too many species, including industrial, service, family and so on. But if a robot needs autonomous exploration, such as positioning, mapping, following, monitoring, path planning, identification and coping, then slam is also "must have". Specific types of slam vary with the variety and application of robots.
Industry applications: Different industries have specific needs of different industries, such as children's toys, if only to track the toy card, then marker-based tracking should be able to meet the basic AR effect; for example, a real estate company wants to realize the VR viewing room, Then you need to consider what kind of experience to bring to the user: 360 degree HD panorama or 3D mesh. If it's a 360-degree panorama of a few locations inside a house, then there's no need for slam, but if it's a real 3D reconstruction (dense reconstruction) model, you have to have a dense map, and in this case you need to consider what the minimum map resolution requirement is for the user to see. Because the existing mobile phone or flat on the 3D camera resolution is generally VGA (640x480), this case rebuilt the 3D model is not photorealistic that effect. And without considering the time and cost of computing, a high precision, high-precision, calibrated high-end lidar+ HD camera system should be able to meet a very good experience. Another example of a game company to do a VR or AR game, the user needs to carry out six degrees of freedom posture tracking, then he needs very accurate location information and map information, but this map is only for positioning, so the accurate and rapid update of the sparse map can be. "Love, hate and enmity" in practical application of Slam
First of all, Slam has some basic requirements for mathematics professional knowledge, including matrix, calculus, numerical calculation and space geometry, and also has some requirements on the basic knowledge of computer vision, including feature points, maps, multi-view geometry, bundle adjustment, Filter and camera model and so on. These knowledge need a certain foundation and accumulation, but also do not need pure mathematics professional background. Slam programming general use C + +, if for a special instruction set or platform for optimization, you may need to know SSE, NEON, GPU some knowledge and experience, need to have a certain experience and sense of system design, need more reliable hands-on ability and write code ability and will. Overall, the threshold is a certain degree of mathematics and engineering background, a certain degree of computer vision of the basic knowledge, a certain degree of programming and experience, as well as the most critical to write code will.
Second, slam emphasizes real-time and accuracy. Slam is a set of large-scale systems, real-time systems are generally multithreaded concurrent execution, resource allocation, reading and writing coordination, map data management, optimization and accuracy, some key parameters and variable uncertainties and high speed high-precision attitude tracking (such as vr/ AR applications must be at least 90fps to be able to solve the vertigo and rendering effect, etc., are challenges to be solved.
Again, slam difficult to adapt to hardware, more difficult in system integration. Slam data from the sensor, and more and more slam species from multiple sensor fusion, then the quality of the sensor has a great impact on the slam effect. For example, if a set of slam system uses a certain camera, the camera in a motionless and the lighting environment completely unchanged when the image noise is very much, then the system on the stability of the attitude tracking effect is very bad, because the feature point extraction will be very inconsistent. Another very practical example, if you use multiple sensors (camera or six-axis gyroscope), if the timestamp is inconsistent (at least the millisecond), it will also affect the algorithm. The calibration and calibration of multiple sensors, and the adjustment of dozens of hundred parameters of the entire system, are very practical and time-consuming things.
In addition, there is a certain demand for mathematics at the same time, Slam currently has a lot of engineering problems, need to calm down to a piece of land at least two times to walk a variety of pits and solve one by one. If you just look at the existing code and think that math and algorithms are in control, and that it's either Yangaoshoudi or not writing code at all, it's pretty scary for the team that really wants to do product development. At present, there are many fields because of the difference between hardware system and product application, so it is still a lot of work to realize slam in all fields, but I believe there will be a big breakthrough in the near future. However, due to the complexity of slam and many algorithms and their products still need to be implemented on the basis of slam, it can be foreseen and slam related product development will still need a lot of manpower and resources for a long time in the future.
Picture description
Fig. 3 Slam's future is only just beginning slam the beginning of the future
In the past two years, with the increasing popularity of VR, AR, Robot capital Market and consumer market, in the direction of slam related sensors, algorithms, software, hardware and so on, small companies are becoming more and more obvious in the key subdivision areas of rapid innovation, large companies in each key direction layout and frequent acquisition trend. At present, software companies go to hardware, hardware companies to do the algorithm, large companies are eager to have their own technology and hardware, small companies hope to quickly implement rapid implementation of fast forward, slam related to the various areas of the product of the efforts of the rapid change, so there is a large and medium-sized companies in slam need to act as soon as possible, and for start-ups Focus on slam research and development, product, deep into an application or industry are feasible ideas.
Because the product and the hardware height difference dissimilation, but the slam related technology's integration and the optimization is very complex, causes the algorithm and the software highly to be fragmented, therefore the market currently does not have a general purpose universal solution, in the short time also does not have. As mentioned above, slam technology is the perception and understanding of the world, is to prop up the VR, AR, robot skeleton, but the skeleton up from the ultimate perfect user experience still have a lot of work to do, Slam's future is just beginning. On the other hand, the mobile hardware computing capacity is far from enough, so slam related technology can and is from the software and algorithm level to the hardware, I believe that in this process will certainly achieve a number of new companies.
From personal career development, the author is very encouraging like-minded partners to invest in VR, AR, robot-related computer vision and sensors and other fields, in doing very interesting and challenging things at the same time personal development along with the whole trend of moving up, is a very good choice.
Author's statement: This article only the representative's personal opinion, does not represent any author to serve the company's technical opinion, the technical direction and the concrete realization, hereby declares.