Tracking Learning Series original, reproduced marked source: http://blog.csdn.net/ikerpeng/article/details/40144497
This article is very praise Ah! It is very necessary to learn it well, and today first record its code ideas (which are given later in the specific derivation process).
First, the decision function used in this article is a function that minimizes structural risk:
In this function: the previous is a loss function, the loss function of f (x) is the last required discriminant function, followed by a structured penalty factor. For the SVM classifier is the hinge loss function (Hinge loss). In practice, however, the least-squares method using the kernel function (regularized Least squares (RLS) with kernels) can achieve the same effect. The article then uses this scheme to solve this function. The resulting results are:
Detailed details are discussed later. Here is the main idea of code.
First: read into the video file, get Groundtruth information, also get the object location and size of information, and then get a target in the target block diagram of the distribution function (Gaussian distribution, which I am not very clear, and the formula is not the same );
Next: Read the first picture, converted to grayscale, the data in the box to filter the processing of the form to obtain a smaller edge effect of the data. And this data is normalized to -0.5~05;
Then: The kernel function k is obtained from the above data, and then the Alpha (c) required by F (x) is obtained by using K; It is noteworthy that the two important parameters of the solution are obtained from Fourier domain, here is an innovative point of this article, but also the reason for the speed so fast )
Next: for each subsequent frame of the image, first converted to grayscale image, and then using the Hann window to preprocess the input data; Next, combine the information from the previous frame image to calculate K again, and then calculate the response value from today's alpha and K, and select the position with the highest response value. (It is worth noting that the response value computed here is each possible target area within the frame to be processed )
finally: calculate today's K based on where the response value is most, and then update the alpha. Then process the next frame of the image. (At the same time, you see that the calculation of the value of the response and the K used to update the alpha is different .) In the code, the k that calculates the response value is the target and the target IMG to be detected convolution, while the update is the target and its own convolution.
existing problems (slowly resolved):
1. What is y in alpha? According to the original formula should be the label, but finally how to become a Gaussian distribution? ( seems to understand, because he is not like others directly to the target label 1 or 0, but a certainty, this is the thought of regression O (∩_∩) o haha ~)
1 ' Is there no reason to update y in alpha? (In fact, it has nothing to do with the position, just the size of the box, there are wood found!) )
2. Why does this represent all the blocks in an image and the response value of this?
3. Hann window for data processing reasons?
Iker Cross
2014.10.17
Target Tracking Series 11: Exploiting the circulant Structure of tracking-by-detection with kernels code ideas