crest:convolutional residual Learning for Visual tracking_ Neural network | Deep learning |matlab

Source: Internet
Author: User
This article overview

The shortcoming of DCF series tracking algorithm is analyzed and improved.

One of the core of this thesis: DCF as a convolution layer in CNN;

The core of this paper is to integrate feature extraction, response graph generation and model updating into CNN for End-to-end training;

In the core of this paper, the idea of residual learning is used to update the depth target tracking network, which can deal with the large and small changes of target appearance more effectively.

The experimental results of this paper are as follows: 0.837 precision and 0.623 coverage are obtained on OTB100; analysis of DCF algorithm

One of the advantages of DCF algorithm: Fast model learning and target detection can be done in frequency domain;

The advantage of DCF algorithm is that the response is intensive in the search domain, which is beneficial to the high precision target tracking;

The advantage of DCF algorithm is that it can achieve good tracking effect by combining depth feature;

The DCF algorithm is less than one: feature extraction and model learning separation, it is difficult to benefit from end-to-end learning methods;

The DCF algorithm is less than two: Model updating adopts the method of sliding weighted averaging, which is not the optimal updating method, because once the noise is involved in the update, it is likely to lead to the drift of the model, so it is difficult to simultaneously get the stability and adaptability of the model.

Improvement One: The model of DCF algorithm is regarded as convolution filter in the deep convolution neural network.

Improvement Two: Integrate the feature extraction from VGG16 and the generation of response graph and the updating of the model into a end-to-end approach;

Improved three: Convolution calculation from time domain, so as to avoid the problem of the boundary effect of DCF in frequency domain solution;

Improvement four: In order to update the model more appropriately, by detecting the difference between the output of the convolution layer (i.e. the response graph) and the ground truth, the method of residual learning is used to capture the change of the appearance, thus to guide the model update, which can not only effectively reduce the influence of the noise update You can also make the model update Lupin when the target appearance changes greatly; The algorithm framework of this paper

In addition to Vgg as a front-end for feature extraction (we can use CNN to extract features), and set up three parallel layers, respectively: the basic convolution layer, time residual layer, space residual layer;

Basic convolution layer: Replace DCF filter, loss function with standard linear kernel DCF loss function;

Residual learning layer: In principle, the basic roll base output should be the same as the ground Truth label, but in fact, although it can be achieved by increasing the basic volume layer, but this will lead to reduced model generalization ability, so add the residual learning layer, by capturing the basic convolution layer and ground Truth the difference between the learning update;

Target tracking Process

Model initialization: Feature extraction using VGG network, random initialization of convolution layer and residual learning layer, given the first frame picture after Finu-tune convolution layer and residual layer make the output close to ground truth;

On-line detection: Extract the characteristics of the search domain, network forward transmission, get the corresponding map, the maximum positioning;

Scale estimation: Multi-scale sampling, selecting the scale corresponding to the maximum value, and updating the smoothing scale;

Model update: Generate training data for tracking model updates during online tracking; experiment

Experimental configuration 1:5 times times the search domain;

Experimental configuration 2: Using the conv4-3 of PCA to reduce the VGG16 to 64 channels as a feature;

Experimental Configuration 3: The label is Gaussian distribution;

Experiment Configuration 4: The scale smoothing parameter is 0.6;

Experimental configuration 5: implemented using MATCONVNET framework;

Experimental configuration 6: When the model initialization, the training learning rate is 5e-8, the learning stop criterion is loss less than 0.02, hundreds of iterations after convergence;

Experiment Configuration 7: When the model is updated, every two frames are updated once, the update is 2 times, the learning rate is 2e-9;

Experimental data set: otb-2013,otb-2015,vot-2016;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.