1, Introduction
DL solves VO problem: End-to-end vo with RCNN
2. Network structure
A.CNN based Feature Extraction
The paper uses the Kitti data set.
The CNN section has 9 convolutional layers, with the exception of CONV6, the other convolutional layers are connected to 1 layers of relu, and there are 17 layers.
B, RNN based sequential modelling
RNN is different from CNN on that it maintains memory's hidden states over time and have feedback loops among them, wh Ich enables its current hidden state to be a function of the previous ones.
Given a convolutional feature xk at time K, a RNN updates at time step K by
HK and YK is the hidden state and output at time K respectively.
W terms denote corresponding weight matrices.
b terms denote bias vectors.
H is an element-wise nonlinear activation function.
LSTM
Folded and unfolded Lstms and internal structure of its unit.
is element-wise product of vectors.
Σis sigmoid non-linearity.
Tanh is hyperbolic tangent non-linearity.
W terms denote corresponding weight matrices.
b terms denote bias vectors.
IK, f k, GK, CK and OK are input gate, forget gate, input modulation gate, memory cell and output gate.
Each of the LSTM layers have hidden states.
3. Loss function and optimization
The conditional probability of the poses Yt = (y1, ..., YT) given a sequence of monocular RGB images Xt = (x1, ..., XT) up to time t.
Optimal Parameters:
The hyperparameters of the Dnns:
(pk,φk) is the ground truth pose.
(p?k,φ?k) is the estimated ground truth pose.
κ (the experiments) is a scale factor to balance the weights of positions and orientations.
N is the number of samples.
The orientationφis represented by Euler angles rather than quaternion since quaternion are subject to an extra unit const Raint which hinders the optimisation problem of DL.
Deepvo:towards end-to-end Visual odometry with deep recurrent convolutional neural Networks