Objective
In this section we will delve into the main optimization methods in visual slam-graph optimization (graph-based optimization). In the next section, we describe a very popular graph optimization library: G2O.
As for G2O, I wrote a document in 13, but as I deepened my understanding, I felt more dissatisfied. In the spirit of more responsible for the reader, this article to you re-tell the picture optimization and G2O. In addition to this document, readers can also find a blog about graph optimization: http://blog.csdn.net/heyijia0327 The article has a simple case that the author introduces, and this article focuses on the understanding and commentary of graph optimization and G2O.
This section mainly introduces the mathematical theory of graph optimization, and the next section g2o the composition and usage of the model.
Pre-Knowledge: optimization
Graph optimization is essentially an optimization problem, so let's start by looking at what the optimization problem is.
The optimization problem has three most important factors: objective function, optimization variable, optimization constraint. A simple optimization problem can be described as follows: \[\begin{equation} \min\limits_{x} F (x) \end{equation}\] where $x$ is the optimization variable, and $f (x) $ is the optimization function. This problem is called unconstrained optimization because we do not give any form of constraint. Because the optimization problem in Slam is unconstrained optimization, we emphatically introduce the unconstrained form.
When $f (x) $ has some special properties, the corresponding optimization problem can also be used in some special solutions. For example, when $F (x) $ is a linear function, it is a linear optimization problem (although linear optimization problems are usually discussed in constrained situations). The inverse is nonlinear optimization. For unconstrained nonlinear optimization, if we know the analytic form of its gradient, we can solve this optimization by directly seeking those points with zero gradient:
\[\begin{equation} \frac{{df}}{{dx}} = 0 \end{equation}\]
Places where the gradient is zero may be the maximum, minimum, or saddle point of a function. Since the form of $f (x) $ is now uncertain, we have to traverse all the extremum points to find the smallest as the optimal solution.
But why don't we use it? Because many times the form of $f (x) $ is too complex, it is impossible to write the analytic form of the derivative, or to solve the equation with zero derivative. Therefore, most of the time we use iterative approach to solve. Starting with an initial value of $x_0$, it constantly leads to the way in which the target function is reduced (reverse gradient), and then goes one step along the gradient direction, thus reducing the function value by a point. With this iterative iteration, it is theoretically possible to find a minimum point for any function.
The strategy of iteration is mainly embodied in how to choose the descending direction and How to choose two steps . There are two main Gauss-newton (GN) and Levenberg-marquardt (LM) methods, and their details can be found on the wiki, and we do not elaborate. Please understand that they differ mainly in iterative strategies, but looking for gradients is the same as iteration.
Graph Optimization
The so-called graph optimization, is to put a general optimization problem, in the form of graphs (graph) to express.
What is the picture?
A graph is a structure composed of vertices (Vertex) and edges (edge), while graph theory is the theory of research graphs. Let's remember a figure for $g=\{V, E \}$, where $v$ is the vertex set, $E $ for the edge set.
Vertices have nothing to say, think of the ordinary point can be.
What is the edge? One edge is connected to a number of vertices, representing a relationship between vertices. The edges can be either forward or non-aligned, and the corresponding graphs are referred to as the graph of the direction or the graph without direction. Edges can also connect a vertex (unary edge, unary edge), two vertices (binary edge, two-ary edge) or multiple vertices (Hyper edge, multi-edge). The most common side joins two vertices. When there are edges connected to more than two vertices in a diagram, the graph is called a Hyper graph. And the slam problem can be expressed as a super-graph (in the case of no ambiguity, the text is directly referred to as the image of the super-image).
How to show the slam problem as a map?
The core of slam is to calculate the motion trajectory and map of the robot based on the observed data. It is assumed that at the moment of $k$, the robot is $x_k$ at the position and the sensor is used to observe the data $z_k$. The observer equation for the sensor is:
\[\begin{equation}{z_k} = H\left ({{x_k}} \right) \end{equation} \]
Due to the existence of the error, $z _k$ cannot be exactly equal to $h (X_k) $, so there is an error:
\[\begin{equation} {E_k} = {Z_k}-H\left ({{x_k}} \right) \end{equation} \]
Well, if we take $x_k$ as the optimization variable to $ \min\limits_x f_k (x_k) = \| E_k \| $ as the target function, we can get the estimate of $x_k$, and then we have what we want. This is actually the idea of using optimization to solve slam.
You say the optimization variable $x_k$, the observation equation $z_k = h (x_k) $ et cetera, what exactly are they?
This depends on our parameterization (parameterazation). $x $ can be a pose of a robot (6 degrees of freedom under the transformation matrix of the $4\times 4$ $\mathbf{t}$ or 3 degrees of freedom under the position and corner $[x,y,\theta]$, can also be a space point (three-dimensional space $[x,y,z]$ or two-dimensional space $[ x,y]$). Accordingly, the observational equation also has many forms, such as:
- The transformation between the two pose of a robot;
- The robot uses laser to measure a certain space point at some pose, and obtains its distance and angle from itself.
- The robot observes a space point with a camera at a pose, and obtains its pixel coordinates;
Similarly, their specific forms are diverse, which allows us to discuss slam issues without being confined to a particular sensor or gesture expression.
I understand what optimization means, but how do they express themselves as graphs?
In the diagram, the optimization variables are represented by vertices, and the observation equations are represented by edges. Since an edge can connect one or more vertices, we write it in the form of a more generalized $z _k = h (x_{k1}, X_{k2}, \ldots) $ to indicate the meaning of not limiting the number of vertices. What are the vertices and edges of the three observational equations just mentioned?
- The transformation between the two pose of a robot, a binary edge (two-yuan edge), a vertex of two pose, an equation of ${t_1} = \delta T \cdot {t_2}$.
- At some Pose, the robot uses laser to measure a certain space point, and obtains its distance and angle from itself;--binary Edge, Vertex is a 2D pose:$[x,y,\theta]^t$ and a point:$[\lambda_x, \lambda_y] ^t$, the observed data is the distance $r$ and the angle $b$, then the observation equation is:
- \[\begin{equation}
\left[
{\begin{array}{*{20}{c}}
{r}\\
{B}
\end{array}}
\right] = \left[{
\BEGIN{ARRAY}{*{20}{C}}
{\sqrt {{{\lambda _x}-X)}^2} + {{({\lambda _y}-y)}^2}}}\\
{{{\tan}^{-1}}\left ({\frac{{{\lambda _y}-Y}}{{{\lambda _x}-X}}} \right)-\theta}
\end{array}}
\right]
\end{equation}\]
- The robot observes a space point with a camera at a Pose and obtains its pixel coordinates;--binary Edge, Vertex is a 3D Pose: $T $ and a space point $\mathbf{x} = [x,y,z]^t$, the observed data is pixel coordinates $z=[u,v]^t $ Then the observation equation is: \[\begin{equation} z = C (R \mathbf{x} + t) \end{equation} \]
$C $ for camera internal, $R, t$ for rotation and pan.
These examples are intended to give the reader a better understanding of what vertices and edges are. Since robots may use a variety of sensors, we do not limit the appearance of vertices and edges after parameterization. For example, I (in the Turnip body) with the addition of laser, also with the camera, but also with the IMU, Wheel encoder, ultrasonic and other sensors to do slam. To solve the whole problem, there are a variety of vertices and edges in my diagram. But no matter what, it can be optimized with graphs.
(Dark turnip small eyes degined by Orchid Zhang) after I can not find a job I will go to be an illustrator forget ...
Figure optimization How to do
Now let's take a closer look at how the graph optimization is done. Assuming a graph with $n$ edges, the objective function can be written as:
\[\begin{equation} \min\limits_{x} \sum\limits_{k = 1}^n {E_k}{{\left ({{x_k},{z_k}} \right)}^T}{\Omega _k}{e_k}\left ({{x_k},{z_k}} \right)} \end{equation}\]
We have a few words to say about this objective function. These words are very important, please read the reader carefully to understand.
- The $e $ function represents an error in principle and is a vector, as a measure of the degree to which the optimization variable $x_k$ and $z_k$ fit. The larger it represents, the $x_k$ the more $z_k$. However, since the objective function must be scalar, it must be expressed in its square form to represent the target function. The simplest form is directly squared: $e (x,z) ^t E (x,z) $. Further, in order to show that we pay attention to the different components of the error, we also use an information matrix $\omega$ to represent the inconsistency of each component.
- The information matrix $\omega$ is the inverse of the covariance matrix and is a symmetric matrix. It's each element of $ \omega_{i,j}$ as a coefficient of $e_i e_j$, which can be seen as an estimate of the correlation of the error term to $e_i, e_j$. The simplest is to set the $\omega$ to a diagonal matrix, and the size of the diagonal array element indicates how much attention we attach to this error.
- The $x_k$ here can refer to a vertex, two vertices, or multiple vertices, depending on the actual type of edge. So, the more rigorous way is to write it $e_k (Z_k, X_{k1}, X_{k2}, \ldots) $, but that is too cumbersome to write, we simply wrote the present look. Since $z_k$ is known, in order to be mathematically concise, we then write it in the form of $e_k (X_k) $.
The overall optimization problem becomes the form of $n$ sums:
\[\begin{equation} \min F (x) = \sum\limits_{k = 1}^n {{E_k}{{\left ({{x_k}} \right)}^t}{\omega _k}{e_k}\left ({{x_k}} \ri ght)} \end{equation}\]
Again, there are many different forms of edges, which can be unary, two, or multivariate, and their mathematical representation depends on the sensor or what you want to describe. For example, in a visual slam, in a camera pose $T _k$ Place the space point $\mathbf{x}_k$ to make an observation, get $z_k$, then this two-dollar side of the mathematical form is $${e_k}\left ({{x_k},{t_k},{z_k}} \ right) = {\left ({{z_k}-c\left ({r{x_k} + t} \right)} \right) ^t}{\omega _k}\left ({{z_k}-c\left ({r{x_k}-T} \right)} \right) $$ A single edge is not really complicated.
Now, we have a graph of many nodes and edges that make up a huge optimization problem. We do not want to expand its mathematical form and only care about its optimal solution. So, to solve the optimization, you need to know two things: an initial point and an iterative direction. For mathematical convenience, first consider the $k$ $e_k (x_k) $.
We assume that its initial point is ${{\widetilde x}_k}$, and give it an increment of $\delta x$, then the estimated value of the edge becomes $f_k ({\widetilde x}_k + \delta x) $, and the error value is from $e _k (\ Widetilde x) $ becomes $e _k ({\widetilde x}_k + \delta x) $. First-order expansion of the error term:
\[\begin{equation} {e_k}\left ({{{\widetilde x}_k} + \delta X} \right) \approx {E_k}\left ({{{\widetilde x}_k}} \right) + \frac{{d{e_k}}}{{d{x_k}}}\delta x = {E_k} + {J_k}\delta x\end{equation} \]
This is $j_k$ is the derivative of $e_k$ about $x_k$, the matrix form of Jacobian array. We made a linear hypothesis near the estimate that the value of the function could be approximated by a first-order derivative, which, of course, was not established when the $\delta x$ was very large.
Thus, for the objective function item of the $k$ edge, there are:
Further expansion:
$$\begin{array}{lll}{f_k}\left ({{{\widetilde x}_k} + \delta X} \right) &=& {E_k}{\left ({{{\widetilde x}_k} + \De LTA x} \right) ^t}{\omega _k}{e_k}\left ({{{\widetilde x}_k} + \delta x} \right) \ \
& \approx & {\left ({{E_k} + {J_k}\delta x} \right) ^t}{\omega _k}\left ({{E_k} + J\delta x} \right) \ \
&=& E_k^t{\omega _k}{e_k} + 2e_k^t{\omega _k}{j_k}\delta x + \delta {x^t}j_k^t{\omega _k}{j_k}\delta x\\ &=&am P {C_k} + 2{b_k}\delta X + \delta {X^t}{h_k}\delta x
\end{array}$$
To a skilled classmate, this deduction is as simple as $ (a+b) ^2=a^2+2ab+b^2$ (in fact it is OK). The last equation is a definition of the arrangement, we put the $\delta x$ unrelated to the collation into a constant term $C _k$, the first term coefficient is written $2b_k$, two times the term is $H _k$ (note that the two-time coefficient is actually hessian matrix).
Please note that $C _k$ is actually the value before the edge changes. So after the increment of $x_k$, the value of the target function $f_k$ item changes to $$\delta F_k = 2b_k \delta x + \delta x^t h_k \delta x. $$
Our goal is to find $\delta x$, which makes this increment a minimum. So direct it to the derivative of $\delta x$ is zero, there are:
\[\begin{equation} \frac{{d{f_k}}}{{d\delta x}} = 2b + 2{h_k}\delta x = 0 \rightarrow {H_k}\delta x =-B_k \end{equatio n} \]
So in the final analysis, we solve a linear equation group: \[\begin{equation} h_k \delta x =-b_k \end{equation} \]
If you put all the edges together and think about it, you can drop the mark and say we're going to solve $$ H \delta x =-B. $$
It turns out that it's just a linear one! The linear who will not solve Ah!
Of course the reader will have this feeling, because the linear programming is the simplest in the planning, even elementary school students will solve such a simple problem, why 21st century ago slam do not do so? -This is because in each iteration we have to solve a jacobian and a sea plug. And a figure often has thousands of edges, hundreds of thousands of parameters to be estimated, which was previously considered to be impossible to solve in real time.
Why then can it be solved in real time?
Slam researchers have come to realize that slam-built diagrams are not fully connected, and tend to be sparse. For example, most of the landmarks in a map will be seen by the robot at very little time, thus creating some side-points. Most of the time they are invisible (just like the harem infatuated). Embodied in mathematical formulas, although the overall objective function $f (x) $ has many items, a vertex $x_k$ will only appear in the side with which it is related!
What will this lead to? This leads to a lot of $x_k$ unrelated sides, such as $e_j$, the corresponding Jacobian $j_j$ is directly a 0 matrix! In the overall Jacobian $j$, the column related to $x_k$ is mostly zero, with only a few places, that is, the edge connected to the $x_k$ vertex, which has a non-0 value.
The $h$ of the corresponding second-order guide matrix is also 0 elements. This sparse performance helps us to quickly solve the above linear equations. For the sake of space, let us not dwell on how this is done. The Sparse Algebra Library includes SBA, PCG, Csparse, Cholmod, and so on. G2O uses them to solve graph optimization problems.
To add that, in the numerical calculation, we can give the analytic form of Jacobian and Hesse to calculate, also can let the computer to calculate the two matrices, and we just need to give the definition of error.
Manifold
Wait a minute, teacher! There is another problem with the deduction above!
Very well, radish, please tell me what the problem is.
When we discussed giving the target function $f (x) $ an increment $\delta x$, it was written directly as $f (X+\delta x) $. But teacher, this addition may not be defined!
The turnip saw a serious problem, which was really overlooked in the previous discussion. Since we do not limit the type of vertices, $x $ after parameterization, it is likely that there is no addition definition.
The simplest is the common four-dimensional transformation matrix $t$ or three-dimensional rotation matrix $r$. They are not closed to addition, because the sum of two transformation matrices is not a transformation array, and the sum of two orthogonal matrices is not orthogonal. Their multiplication is very good in nature, but there is no addition, so it is not possible to take the derivative as discussed above.
However, if figure optimization cannot handle the elements in $se (3) $ or $SO (3) $, that would be very frustrating, because the robot trajectories that slam want to estimate must be described with them.
Think back to the knowledge of the Lie algebra we talked about earlier. Although the Lie Groups $SE (3) $ and $SO (3) $ are not additive, but they correspond to the Lie Algebra $\mathfrak{se} (3), \mathfrak{so} (3) $ yes! Mathematically speaking, we can ask for gradients on the manifold in the tangent space! If the reader finds it difficult to understand, we say that through exponential and logarithmic transformations, the transformation matrix and the rotation matrix are converted into Lie algebras, the addition is made on the Lie algebra, and then the original Lie groups are converted. So we're done with the derivative.
The advantage is that we don't have to re-derive the formula at all. This is more simple than we thought. In the program, we just need to redefine the incremental addition of an optimization variable $x$. If $x$ is a transformation matrix in the $se (3) $, we will obey the conversion method of the Lie algebra just described. Of course, if $x$ is something else strange, as long as it defines its addition, the program will automatically calculate how to ask for its jacobian.
Kernel function
It's a nuclear function again! Students who have learned the machine learning course must say so.
Unfortunately, there is also a kernel function in graph optimization. The reason for introducing kernel functions is because the wrong edges may be given in slam. The data association in slam has left scientists with a headache for a long time. For reasons of change, noise, and so on, the robot is not sure of a signpost it sees, it must be a signpost in the database. What if you admit it? What if I add a side that I shouldn't have added to the picture?
Well, that optimization algorithm can be foul forced ... It will see an edge with a large error, and then try to adjust the estimated values of the nodes to which the edge is connected, so that they conform to the unreasonable requirements of this edge. Because this side error is really large, it tends to erase the effect of the other correct edges, allowing the optimization algorithm to focus on adjusting an incorrect value.
So there is the existence of a nuclear function. The kernel function guarantees that the error of each edge will not be large without the edge, masking off the other edges. The way to do this is to replace the two-norm metric of the original error with a function that does not grow as fast as it does, while guaranteeing its smooth nature (otherwise it cannot be derivative!). )。 Because they make the whole optimization result more robust, they are called robust kernel (robust kernel function).
Many robust kernel functions are piecewise functions that give linear growth rates at large inputs, such as Cauchy nuclei, Huber cores, and so on. Of course, we do not begin to elaborate on the specific.
Kernel functions are used in many optimization environments, and when bloggers are more impressed, there are a lot of people in the machine learning algorithm Riga a variety of cores, we now use the SVM will also take a kernel function.
Summary
Finally, we summarize the process of making diagram optimization.
- Select the types of nodes and edges in the diagram you want, and determine their parametric form;
- Adding actual nodes and edges to the diagram;
- Select the initial value and start the iteration;
- In each iteration, the Jacobian matrix and the sea-plug matrix corresponding to the current estimate are computed;
- Solving the sparse linear equation $h_k \delta x =-b_k$ to get the gradient direction;
- Continue iterating with GN or LM. If the iteration ends, the optimization value is returned.
In fact, G2O can help you do a 第3-6 step, and all you have to do is take the first two steps. We'll try it out in the next section.
In-depth understanding of graph optimization and G2O: Graph optimization