Moving target Tracking (19)--tld

Source: Internet
Author: User
Tags tld

Simple collation of the next, there are all, good chaos ...

TLD is the abbreviation of an algorithm, which is called tracking-learning-detection by the original author.

Author website Link: http://personal.ee.surrey.ac.uk/Personal/Z.Kalal/

Paper:

(1) tracking-learning-detection

(2) Forward-backward Error Automatic Detection of Tracking failures

(3) Online learning of robust object detectors during unstable tracking

Source Code Resources:

1. Original author Zdenek Kalal

Author's homepage: http://info.ee.surrey.ac.uk/Personal/Z.Kalal/

Source code page: Https://github.com/zk00006/OpenTLD

Programming language: Matlab + C

2. Alan Torres Edition

Source code page: Https://github.com/alantrrs/OpenTLD

Implementation language: C + +

Blog Resources (Chinese):

1. TLD (tracking-learning-detection) Learning and source Understanding (ZOUXY09)

http://blog.csdn.net/zouxy09/article/details/7893011

2. "Re-talk about PN learning":

http://blog.csdn.net/carson2005/article/details/7647519

4. "TLD Vision tracking Technology Analysis"

Http://www.asmag.com.cn/number/n-50168.shtml

Http://blog.sina.com.cn/s/blog_627250020102ux9p.html

In contrast, TLD is a relatively introductory tracking algorithm, with no particularly complex theoretical rationale. In fact, one sentence summary, is the median optical flow + cascade detection (from Variance, random forest, nearest neighbor) Fusion part, after, according to the results, update cascade detection. But it takes into account the multi-scale.

Direct to Daniel's analysis, to the code over the side, no problem.

Principle Analysis:

TLD (tracking-learning-detection) is a new single-target long term Tracking tracking algorithm presented by a Czech PhD student at the University of Surrey in the UK during his doctoral degree. The significant difference between the algorithm and the traditional tracking algorithm is that the traditional tracking algorithm and the traditional detection algorithm can be used to solve the deformation and partial occlusion of the tracked target during the tracking process. At the same time, through an improved on-line learning mechanism, the "salient feature point" of tracking module and the target model of detecting module are updated continuously, which makes the tracking effect more stable, robust and reliable.

A key issue for long-term tracking is that the system should be able to re-detect it and start re-tracking when the target is re-appearing in the camera's field of view. However, in the course of long-term tracking, the target will inevitably change shape, illumination condition, scale change, occlusion and so on. The traditional tracking algorithm, the front-end needs to match with the detection module, when the tracked target is detected, then began to enter the tracking module, and thereafter, the detection module will not be involved in the tracking process. However, this method has a fatal flaw: that is, when the tracked target has a shape change or occlusion, the tracking is easy to fail, so for long-time tracking, or the tracking target has the shape of the changes in the case of tracking, many people use detection method to replace the tracking. Although this method can improve the tracking effect in some cases, it requires an offline learning process. That is, before testing, a large number of samples of the tracked targets need to be selected for learning and training. This means that training samples will cover the various deformations and variations in scale, posture and illumination that may occur in the tracked target. In other words, the use of detection method to achieve long-time tracking purposes, training samples for the choice of the most important, otherwise, the robustness of the tracking is difficult to guarantee.

Considering that the simple tracking or simple detection algorithm can not achieve the ideal effect in the long-time tracking process, the TLD method considers combining the two and adding an improved online learning mechanism, which makes the whole target tracking more stable and effective.

In simple terms, the TLD algorithm consists of three parts: the tracking module, the detection module, and the learning module, as shown in the following figure


Its operation mechanism is: the detection module and the tracking module do not interfere with each other in parallel processing. First, the tracking module assumes that the motion of the object between adjacent video frames is limited, and that the tracked target is visible to estimate the motion of the target. If the target disappears in the camera's view, it will cause the trace to fail. The detection module assumes that each view frame is independent of each other and, based on the previously detected and learned target model, makes a full-image search of each frame of the image to locate the area where the target may appear. As with other target detection methods, there may be errors in the detection module in the TLD, and the error is nothing more than a negative sample of the error and a positive sample of the error. The Learning module evaluates the two errors of the detection module according to the results of the tracking module, and generates a training sample to update the target model of the detection module according to the evaluation results, and updates the "key feature points" of the tracking module to avoid similar errors later. The details of the TLD module, and the flow diagram is as follows:



Before detailing the process of TLD, there are some basic knowledge and basic concepts that need clarification:

Basic knowledge:

Detection features:

The detection section uses a structure that the author calls Fern, which is improved on the basis of the random forests, which may be called the random Fern.

Characteristics of 2bitBP (2bit Binary Pattern)

This feature is a feature similar to Harr-like, which includes feature types and corresponding feature values.

Suppose now we want to determine whether a patch block is the target we want to detect. The so-called feature type, refers to the patch in the (x, y) coordinates, take a long width, high height of the frame, this combination (x, y, width, height) is the corresponding feature type.

The following explains what is the value of a feature. In the case where the feature type has been selected, if we divide the frame into equal two parts, calculate the gray level of the left and right parts respectively, then there are two cases: (1) The left gray scale is large, (2) The gray scale is large, and it is intuitive to see the color on both sides. Similarly, the frame is divided into two equal parts, there will be two cases, intuitive to see, is the upper and lower sides of the color deeper. So in the top and bottom around, there will be a total of 4 cases, you can use 2bit to describe the 4 cases, you can get the corresponding characteristics value. This process can be found in Figure 1.

In fact, each type of feature looks at the object we want to track from a certain point of view. For example, the red box in Figure 1, in this frame, the headlights of the local gray should be deeper, then the red box of this type of characteristics, in fact, it means that if the patch is a car, in the corresponding place, the corresponding length and height, the place color should be deeper.



Figure 1. 2bitBP Feature Description

In fact, this is the same as the nature of fast features. It's just that fast takes a circle within the vicinity of the dot. This is taken only one time adjacent.

Random Fern

As mentioned earlier, each type of feature represents a view of a tracking object, so can a combination of several types of features be used to better describe the object being tracked. The answer is yes, or take the example of Figure 1, there is a headlight on the left, there is a headlight on the right, and if we take both frames, we can foresee that the effect of the test will be better than only one frame. The idea of Random Fern is to use multiple feature combinations to express objects.

Next, let's talk about how a fern is generated and made, and how to make a unified decision in the case of multiple fern.

Assume that we have selected nfeat types of features to express objects. Each tree fern is actually a 4-fork tree, as shown in Figure 2, how many types of features are selected, and how many layers of this 4-fork tree there are. For a patch, each layer with the corresponding type of characteristics to determine the characteristics of the corresponding type of value, because the use of 2bitBP features, there will be 4 possible values, the next layer and the same operation, so that each patch will eventually go to the bottom of a leaf node.

for the training process , record the number of positive samples falling to each leaf node (in NP), and also record the number of negative samples falling to each leaf node (in nn). The posterior probability np/(np+nn) of the positive sample falling to each leaf node can be calculated.

for the detection process , the patch to be detected will eventually fall to a leaf node, as the training process has recorded the positive sample falls to each leaf node of the posterior probability, the final output of the patch is the probability of a positive sample.




Figure 2. Structure of the Fern

This fern two value, more like a simplified version of the Orb feature.

A fern generation is described earlier, and a patch is detected with fern, and the probability of it being a positive sample is given. When such multiple Fern are judged, multiple posteriori probabilities are given. It's like we let a lot of people make decisions, see if this thing is a positive sample, and each person corresponds to a fern. Finally we calculate the posterior mean of this series of fern output to see if it is greater than the threshold value, and ultimately determine whether it is a positive sample.

The author skillfully combines the idea of random forest with Orb thought to form its own classifier.

PN Learning

Reference: http://blog.csdn.net/carson2005/article/details/7483027

PN Learning (PN Learning) is a sample using a labeled sample (commonly used for classifier training, hereinafter referred to as test samples) and non-labeled samples (typically used for classifier testing, The following are known as test samples) between the structural characteristics (see below) to gradually (learn) training two types of classifiers and improve the classifier classification performance methods.

Positive constraints (Positive constraint) and negative constraints (negative constraint) are used to limit the labeling process of test samples, and PN learning is controlled by positive and negative constraints. PN Learning evaluates the classification results of the classifier on the test sample, finds the samples that contradict the constraint conditions, re-adjusts the training set, and repeats the iterative training until a certain condition is satisfied before the classifier training process is stopped. In the course of target tracking, due to the change of the shape and posture of the tracked target, the situation of the target and lost happens frequently, so in this case, the online learning and detection of the tracked target is a good strategy. PN Learning is just the place to be.

Many of the learning algorithms assume that test samples are independent of each other, however, in the application of computer vision, there is a dependency relationship between the labels of some test samples. The existence of such a dependency between tags, which we call structural. For example, during the target detection process, our task is to assign labels to all areas in the image where the target may exist, i.e., the area is a foreground or background, and the label here can only be one of the foreground or background. For example, in the use of video sequences in the target tracking process, adjacent to the tracked target trajectory line area, can be considered as a foreground label, and away from the trajectory of the area, can be considered a background label. The positive constraints mentioned above indicate that all possible labels are positive, for example, the area adjacent to the trace line, and the negative constraint indicates that all possible labels are negative.

Through the above analysis, it is not difficult to find that PN learning can be defined as a process:

(1) Prepare a small set of training samples and a large set of test samples.

(2) Training an initial classifier with a training sample. At the same time, the training samples are used to adjust the (priori) constraint conditions accordingly.

(3) using the classifier to assign a label to the test sample, and to find out the label assigned by the classifier and the constraint conditions of those samples;

(4) Re-assign the conflicting sample to the label, add it to the training sample and retrain the classifier;

Iterate over the process repeatedly until a constraint is met.





At any given moment, the tracked target can be represented by its state attribute. The Status property can be a tracking box that represents the location of the target, a scale size, or a marker that identifies whether the tracked target is visible. The spatial domain similarity of two tracking boxes is measured by the overlap degree (overlap), which is calculated by the intersection of two tracking boxes and the quotient of the two sets. The shape of the target is represented by a picture patch (image patches, which the individual thinks can be understood as sliding windows), and each picture is sampled from the inside of the tracking box and normalized to the size of the 15*15. Two picture photos












Here is the source code analysis:

Original:

http://blog.csdn.net/zouxy09/article/details/7893026

From the main () function, analyze the operation of the entire TLD as follows:

(This is just an analysis of the work process, all the annotated Code see blog update)

1, the analysis program runs the command line parameter;

./run_tld-p. /parameters.yml-s. /datasets/06_car/car.mpg-b. /datasets/06_car/init.txt–r

2, read into the initialization parameters (variables in the program) file parameters.yml;

3, through the file or the user mouse box select the way to specify the target to be tracked bounding box;

4. To initialize the TLD system with the bounding box and first frame image obtained above to track the target

Tld.init (Last_gray, box, bb_file); The initialization contains the following work:

4.1, Buildgrid (frame1, Box);

The detector adopts the strategy of Scanning window: Scan window step is 10% width, scale scale factor is 1.2; This function constructs all the scan window grids and calculates the overlap between each scan window and the target box of the input, and the overlap is defined as the ratio of the intersection of two box to their set;

4.2, for a variety of variables or containers to allocate memory space;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.