Learning multi-domain convolutional Neural Networks for Visual tracking notes

Source: Internet
Author: User

This paper uses DL to do visual tracking and currently implements state of the art on object tracking benchmark and VOT2015. The method of the article is more intuitive:

The author calls it Multi_domain network. In essence, the author first takes a large number of positive and negative samples of the first frame to train the CONV1-CONV2-CONV3-FC4-FC5 network, and for FC6, for different input image sequences ( For example, basketball in OTB, Bolt) trained this fc6 alone. The k in the image above corresponds to the number of domain, that is, the number of different categories in the OTB. Note that the number of different videos in OTB is not equal to the number of categories, because some videos belong to the same class, such as Car4,cardark,carscale. So the author simply regards these as a domain, the training time FC6 layer is the same, this is the so-called domain source.

And then we'll talk specifically about what the author does:

First, the author needs to get a large number of positive and negative samples from the first frame of all the image sequences in the DataSet to train the network above, which is an iterative process because of the need to select some "good" negative samples:

The choice is called hard negtive minding, and the intuitive result is to try to select the Mini-batch near the positive sample (which, after all, is the most helpful to distinguish), batch the resized into the 107*107 size, input into the network, Iterative knowledge of network convergence is always iterated. Note that each domain corresponds to a single fc6 layer, and the front layer is jointly trained. Thus appeared the so-called Multi-domain network.

The second step is to use the training model to do tracking. Because the network itself is not very complex, so in the process of tracking, the author took the first frame of the network also finetune, rather than as a general network just update the classifier. Meanwhile, in the process of tracking, the author adopts different update strategies for different situations: long term updating and short term updating. The model is immediately updated when the target is categorized into a background. At the same time, there may be a situation where the positive samples obtained from the predicted position of a frame differ greatly from the real ground_truthc, that is, these positive samples do not have a good match real ground_truth, in order to solve this problem, the author will Detection's bounding box regression technology blends in to try to match the predicted bounding box with ground truth.

The third step is forecast.

The authors predict 256 candidates for each frame, and have different positions and scales. Then find out the best, the position of the target in the frame.

Experiment IV

The first experiment was done on OTB, with 100 videos, and the authors trained vot2013,2014 and 2015 videos, which required a lot of data training anyway. The results on OTB50 and OTB100 are as follows:

It is very high indeed.

It is worth mentioning that the author uses two techniques:hard negative minding and bounding box regression. The former has great influence on the precision (DP) curve, and the latter has great influence on the overlap rate (OP) precision. It can be seen from the following comparative experiments:






Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.