Visual Tracking Method: The DLT (deep learning tracker) is really a fire, it should be able to represent the 2013 tracking field of State-of-art. Recently, it has been carefully studied, in accordance with the framework, core ideas, the prospect of "deep analysis."
Frame
The entire algorithm is still in the mainstream pf (particle filter) probabilistic framework. PF to do is to select multiple candidate areas in one frame of image and then confirm the candidate area by various methods (measure). Briefly summarized below: PF consists of three parts: Drift--diffuse--measure The goal is to select the most probable tracking box from the video frame to complete the trace. To achieve the goal, we use "particles" to model the affine parameters between two adjacent frames to find the most weighted particles. Complete the current frame trace with the most weighted particle + previous frame known trace result =.
Drift and diffuse respectively complete the random perturbation and diffusion processes of the particles, so that the particles are distributed as far as possible in the actual affine transformations (actual particles). Finally, the process of determining which particles are close to the actual particles is done by measure, and the corresponding tracking box for each particle is usually first obtained and then the measure process is performed. Different tracking algorithms also differ in the measure section, for example: IVT (Incremental visual tracking) uses the incremental PCA Online Update template to compare the similarity with each particle tracking box, with the most similar tracking box being used as the final tracking result. The DLT is very different, it will be sparse from the encoder part of the encoder connected to a sigmoid layer, forming a measure classifier, each particle tracking box input into the measure classifier, the output results scored the highest tracking box as the final tracking results.
In conclusion, the DLT is tracked through the pf+measure classifier, and the measure classifier is formed by a network of encoder+sigmoid layers. Because the measure part involves autoencoder multi-layered training, it is called deep learning Tracker.
Core Ideas
The core thought of DLT is embodied in the measure part of PF frame, and it is necessary to use encoder+sigmoid layer when the online tracking is described from the above. So how to train it.
1.encoder part of the training (offline)
The sparse self-coding machine (Autoencoder) includes the coding part (encoder) and the decoding part (decoder), in order to be expressed in the hidden layer between the encoder layers, it is necessary to train the sparse self-coding machine. The training of the sparse self-coder is completed, and the training parameters are obtained and then applied directly to the encoder training. Because of the many parameters between the network layer and the layer of the sparse self-coding machine, and the depth of the layer, it is necessary to train the offline with massive data, and then get the common and invariable characteristics of the target after training.
2.sigmoid Part training (at the beginning of online)
Sigmoid part of the training is mainly to get the sigmoid layer and the encoder layer connected to more than 200 parameters, because the number of parameters is relatively small, it can be started online tracking training. Training Prerequisites: Obtain a sample of a known label. 10 pos samples and 100 neg samples can be obtained from the first frame of the manual calibration, using forward propagation algorithm, reverse propagation algorithm, batch gradient descent algorithm to find the loss function (least squares: All samples sigmoid output and the known output squared difference) local minimum value. Get the parameters for the entire measure section.
3. Update the sigmoid layer with-------> real-time
When tracking, the drift--diffuse after the particle corresponding to the tracking frame to the measure section, generating output, the highest scoring tracking box as the final tracking results. Note: A POS sample is updated each time a trace is completed, so it responds to the most recent tracking sequence. If the maximum score < The threshold we set, then we will re-collect the Neg sample and carry out the above 2 steps together with the real-time updated POS sample. The updated sigmoid layer is more responsive to the effects of recent video sequences on the current frame, making better use of time information.
Outlook
The DLT track in many uncontrolled scenes is very beautiful and close to perfection. For the light is robust, the speed is robust, the attitude is robust, but for multi-objective and partially overlapping scenes, the tracking effect is not ideal.