TLD (tracking-learning-detection) is a new single-target long time ( Long term tracking ) tracking algorithm. The significant difference between the algorithm and the traditional tracking algorithm is that the traditional tracking algorithm and the traditional detection algorithm can be used to solve the deformation and partial occlusion of the tracked target during the tracking process. At the same time, through an improved on-line learning mechanism, the "salient feature point" of tracking module and the target model of detecting module are updated continuously, which makes the tracking effect more stable, robust and reliable.
A key issue for long-term tracking is that the system should be able to re-detect it and start re-tracking when the target is re-appearing in the camera's field of view. However, in the course of long-term tracking, the target will inevitably change shape, illumination condition, scale change, occlusion and so on. The traditional tracking algorithm, the front-end needs to match with the detection module, when the tracked target is detected, then began to enter the tracking module, and thereafter, the detection module will not be involved in the tracking process. However, this method has a fatal flaw: that is, when the tracked target has a shape change or occlusion, the tracking is easy to fail, so for long-time tracking, or the tracking target has the shape of the changes in the case of tracking, many people use detection method to replace the tracking. Although this method can improve the tracking effect in some cases, it requires an offline learning process. That is, before testing, a large number of samples of the tracked targets need to be selected for learning and training. This means that training samples will cover the various deformations and variations in scale, posture and illumination that may occur in the tracked target. In other words, the use of detection method to achieve long-time tracking purposes, training samples for the choice of the most important, otherwise, the robustness of the tracking is difficult to guarantee.
Considering that the simple tracking or simple detection algorithm can not achieve the ideal effect in the long-time tracking process, theTLD Method considers combining the two and adding an improved online learning mechanism, which makes the whole target tracking more stable and effective.
In short, theTLD algorithm consists of three parts: the tracking module, the detection module, the Learning module, as shown in
Its operating mechanism is: the detection module and the tracking module complementary interference parallel processing. First, the tracking module assumes that the motion of the object between adjacent video frames is limited, and that the tracked target is visible to estimate the motion of the target. If the target disappears in the camera's view, it will cause the trace to fail. The detection module assumes that each view frame is independent of each other and, based on the previously detected and learned target model, makes a full-image search of each frame of the image to locate the area where the target may appear. As with other target detection methods, There may be errors in the detection module in the TLD, and the error is nothing more than a negative sample of the error and a positive sample of the error. The Learning module evaluates the two errors of the detection module according to the results of the tracking module, and generates a training sample to update the target model of the detection module according to the evaluation results, and updates the "key feature points" of the tracking module to avoid similar errors later. The details of the TLD module, and the flow diagram is as follows:
before detailing the process of TLD, there are some basic knowledge and basic concepts that need clarification:
Basic knowledge:
At any given moment, the tracked target can be represented by its state attribute. The Status property can be a tracking box that represents the location of the target, a scale size, or a marker that identifies whether the tracked target is visible. The spatial domain similarity of two tracking boxes is measured by the overlap degree (overlap), which is calculated by the intersection of two tracking boxes and the quotient of the two sets. The shape of the target is represented by a picture patch (image patches, which theindividual thinks can be understood as sliding windows ), and each picture is sampled from the inside of the tracking box and normalized to the size of the 15*15. Two picture photos
The principle of TLD algorithm 2--learning Comprehension (III.)