This is Kaihua Zhang published in ECCV2012 's paper, a single-target tracking algorithm based on compression sensing (compressive sensing) is proposed, which utilizes rip (which satisfies the compression perception (compressive sensing). Restricted isometry property) conditions of the Random measurement matrix (random measurement matrix) for Multiscale (multiple scale) image features (features) to reduce the dimension, The characteristics are then classified by the naive Bayesian classifier (Naive Bayes classifier) to predict the target location.

First, we introduce the following knowledge points: paper

1. Random Projection (Projection)

The projection of the X (M-dimension) of a high-dimensional image space into a low-dimensional space V (n-dimensional) is represented by the Matrix R (M*n dimension):

V=rx (N<<M)

This is what we often call dimensionality reduction, but the dimensionality can not only reduce the dimension, but also to do the most possible to retain the high-dimensional information, how to do it? Johnso-lindenstrauss pointed out that if two points in the vector space can be projected into a randomly selected sub-space of a suitable high dimension, the distance between two points can be reserved with high probability, and the "appropriate latitude" in the previous sentence is smaller than the original dimension, Moreover, Baraniuk in the paper proves that the random matrix satisfying the Johnso-lindenstrauss inference satisfies the restricted Isometry property (RIP) condition of compressive sensing. So if the random matrix R satisfies Johnso-lindenstrauss inference, and X is a compressible signal such as speech or image, we can reconstruct the high-dimensional x with a minimum error from the high probability of the low-dimensional v.

2 Random measurement matrix (randomness measurement matrix)

A typical random measurement matrix satisfying rip conditions is a random Gaussian matrix (Gaussian matrix) R, (each value in R Rij obeys N (0,1)), but the matrix has a disadvantage that is generally dense (dense), This can lead to too much overhead and unbearable when accessing and computing.

The highlight of paper is to find a very sparse random measurement matrix.

Achlioptas proves that the matrix satisfies Johnso-lindenstrauss inference when s=2 Or3, while the s=3 matrix is very sparse, because the probability of 1-1/3=2/3 in the matrix is 0, so the calculation cost of 2/3 is reduced, paper is set s=m%4, M is the dimension of the compressed signal x so that for each line in R only the C=s (c is less than or equal to 4) elements are computed, so the computational complexity of the matrix becomes O (CN). At the same time to save the matrix only need to consider non-0 elements, the space complexity is also reduced a lot.

3 Scaling invariance (scale invariant)

To deal with the scaling problem in the trace, for each sample, a set of Multiscale matrix filters are used to paper the sample, and the filter is expressed as follows:

wherein the i,j respectively represents the width and height of the rectangular filter, the convolution after the dimension of an image feature is w*h, there is a total of w*h filters, so all the image features reshape, the dimension is (W*H) 2 of the column vector, and then the column vectors are connected to a high-dimensional multi-scale image feature vector, The size of the m= (WH) 2,m is generally 106-1010, the direct calculation is unbearable, and then we introduced the random measurement matrix R to the X projection to V to achieve dimensionality reduction, white is to reduce the amount of computation in the actual calculation, the random measurement matrix only need to be calculated at the time of program initialization, Then the matrix multiplication is only considered non-zero multiplication, because the non-zero of each row is less than or equal to 4, it is possible to calculate the matrix multiply effectively.

4. Build and update classifiers

Suppose V is distributed independently, modeled by naive Bayesian classifier naive Bayes classifier.

First, assuming a priori probability, Diaconisand freedman[3] points out that the random projections of high-dimensional random variables almost satisfy the Gaussian distribution, so the conditional distributions in the classifier H (v) are assumed to satisfy

The next thing we need to do is to model these four parameters, update the classifier at each frame updating the above four parameters

Where lamda>0 represents the learning rate. The mean and variance are initialized as follows:

The flowchart for tracking is as follows:

Tracking algorithm as follows:

1. At frame T, the position of the target of the t-1 frame is centered, R is the radius, and the sampling is characterized by low latitude.

2. Use Bayesian classifier H to classify sampled features with the highest response value for the target position of frame T.

3. Determine the target position of frame T and collect a positive sample near it, collect negative samples away from the target location, and update the classifier H for prediction of the target position of the next frame.

I have verified the compressive tracking found in the actual test when the algorithm is often lost or offset, in the subsequent publication of the comparative results of the article also have similar conclusions, I analyzed the following several reasons:

First of all, the class Haar features, generally known, Haar features are relatively simple, and the most widely used in human face detection, but in tracking such as pedestrian or hand-rich objects such as the characteristics of the performance is not satisfactory, so you can change some more expressive features to improve the performance of the algorithm.

Secondly, when the author selects the feature when updating the classifier, it randomly selects a positive sample in a small circle around the target, and selects a negative sample in a ring, and does not take into account the importance of the sample when selecting the sample, for example, if the sample in the algorithm is close to the target center and a slightly farther sample is the same in the classification, This causes the classification to be inaccurate and is randomly selected at a small circle around the target at the predicted target location, so I feel that I might not be able to choose the correct location.

In a word, the advantage of the compression-aware tracking algorithm is fast, this is the selling point of the article, in order to quickly select the Bayesian classifier, two probability directly one addition to the results, but with the object occlusion or changes in appearance when the classifier can not be updated, resulting in reduced classification effect.

The author's article is very good as a primer, after reading it easy to understand, but also have code directly to get started practice, again feel the author's work let us think the algorithm is not so inscrutable.

Finally, tell me what I think. As an important research field in computer vision research, target tracking has been eight immortals crossing, recount, various algorithms such as multi-sample learning, spatio-temporal context, convolutional neural network, structured SVM, FFT accelerated, Linear regression and so all show a strong strength, performance has been soaring, just read this and come out an article, feeling overwhelmed, the brain to explode bright, but let me have more interest to study, then I will read some of the recent paper understanding put up and share with you, If there is any ill-considered, I hope you can help me to point out.

If you have any questions welcome message to discuss together, I wrote my own CT tracker C + + code on my github, everyone interested can download to see.

My Sina Weibo: http://weibo.com/1270453892/profile?topnav=1&wvr=6

My github:https://github.com/pbypby.

Real-time compressive Tracking, an interpretation of real-time compression perceptual tracking algorithm