Robust Visual tracking via convolutional Networks Reading notes

Source: Internet
Author: User

The author of this article is: Prof. Kaihua Zhang, Qingshan Liu, Yi Wu, and Ming-hsuan Yang, all Daniel. A few days ago to see them put out the new version of CNT, so seize the time to read, the overall idea and the previous article is not very different, the following detailed introduction to this article. Basic Ideas

The author wants to construct an online two-layer convolution neural network for visual tracking. Firstly, in the first frame image, the author extracts many normalized image patch from the target area, and then uses the K-means algorithm to get the representative patch as the filter. Of course, these filters patch also contain contextual information about the area around the target, which is used to define the feature map (feature maps) of the subsequent frame image. These maps are used to measure the similarity between each filter and a useful local strength pattern on the target upstream. Therefore, the local structure information of the target can be encoded. In addition, all maps can be combined to generate a global representation based on the middle-level feature, and the difference between the expression based on the part method is to preserve the geometric layout of the target. Finally, the soft shrinkage method of adaptive threshold is used to realize the denoising of global expression and to form a sparse expression. This expression is updated with a simple and effective online strategy that can effectively adapt to the changing appearance of the target. The entire model framework is as follows:

Firstly, the image samples are segmented into the 32*32 size patch, and the Kmeans algorithm is used to cluster the normalized image patch of certain ages from these cutting regions, and to extract the normalized local patches from the area around the target. We then use them as a filter to convolution the normalized sample of subsequent frame extraction, thus generating a set of feature graphs. Finally, this set of feature maps is shrinkage by soft method to generate a sparse expression.

This process down, we found that this is very different from the standard CNN, the author also noted: does not contain pooling operations. the specific implementation is as follows A first is the image expression
Step1, pretreatment. Grayscale the input image and then warp it into a fixed size n*n size. Dense resampling is then carried out to get a lot of images patch Y={y1,.... Yl}, the size of the w*w size, then the total number of images patch (n-w+1) * (n-w+1), all patch through the mean-value processing and L2 normalization to eliminate the effects of light and contrast. Step2 for the first floor,

Which is the simple cell layer. Using K-means clustering method, the author selects a group of Patch F as a filter template from Y. Given the first filter template, the corresponding map for the input image I can be represented as the following illustration:

From this figure, we can see that although the appearance of the target has changed obviously due to the change of illumination and scale, the output after convolution filtering, that is, the simple cell feature maps can not only preserve the local structure of the target, but also maintain the global geometric layout of the target almost unchanged. Also, because the contextual background around the target provides a lot of useful information for distinguishing between goals and backgrounds, it is also necessary to sample the background and K-means clustering to obtain a set of templates: and then use average pooling to get an average background, and then with the input image I do convolution, can be obtained:

Step 3 Complex layer

The author of this layer uses a 3-D Zhang Shilai to represent the collection of D-feature maps obtained by the simple cell layer. Since the complex layer of the traditional CNN method has shift-invariant, but this is not acceptable for tracking (which can create position confusion), the author uses the Shift-variant complex cell in the article. Features to ensure the robustness of the method for position mixing. Porcelain King, this feature is also particularly robust for scale changes. By warp the targets at different scales into a fixed scale, each useful part of the target is not significantly changed in the warped image, so complex cell features can retain useful parts geometric layouts at different scales.

And in order for this tensor C to be robust to the noise caused by the change in the target's appearance, we use a sparse vector c to approximate VEC (c), which can be achieved by means of sparse coding:

----------------------------------2

Then the solution of the model can be solved by using the soft shrinkage method:

--------------------------3 Step 4 model update

This piece involves the model update of C in Formula 2, no longer repeat. Proposed tracking algorithm

The author's framework is based on the partial filter method. Given the previous T-frame observation value, the posterior probability of which. Represents a prediction based on the current state in the previous state. is the probability that the observation model indicates that the observed quantity O is the target in the current state.








Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.