Analysis of CMT Tracking algorithm
Clustering of static-adaptive correspondences for deformable Object Tracking
Fundamentals
For the object tracking, the basic idea is to be able to constantly detect the characteristics of the object, so as to constantly get the position of the object tracking. There are three common ways to do this:
The first model based on the whole is to track, for example, TLD, to achieve a good representation of the object characteristics by constantly updating the model (learning);
The second object-based local tracking, is to decompose the object into multiple parts, each part of a separate tracking, the use of optical flow and other methods;
The third kind of tracking is based on the feature point of object, which is real-time detection of the feature point of the object and the matching method with the first feature point to realize the object tracking.
From the above method to see the current tracking algorithm is not simple to use tracking two words to describe, the algorithm in fact used in object detection, recognition, machine learning and other various methods. As long as you can frame an object in the video, and then continue to follow, then this algorithm is tracking algorithm, tracking algorithm is also completely dependent on can be good frame. In fact, many of the tracking algorithms are now detection algorithms.
The CMT Tracking algorithm uses the third way of thinking, using feature points. The question is how to determine which feature points in the next frame are matched to the feature points in the current box, so that the trace is complete as soon as the feature points of the object in the next frame are well found. Therefore, in order to solve this problem, the author made a seemingly simple innovation: is to calculate the relative position of the feature point, in the box center to calculate, for non-deformed objects, regardless of how the object moves the rotation, the above feature point relative to the center of the distance is determined by the scale, Thus, it is possible to exclude a feature point that is not.
The author takes two parts of the feature point of the next frame: part is to calculate the optical flow of the feature points in the previous frame box, so as to get the feature point position of the current frame, and the other part is to directly calculate the feature points of the current frame, match the feature points of the previous frame, get the matching feature points, and then combine the two obtained feature points together. , we get the initial feature points of the next frame, and then filter the feature points, which is the method of the previous paragraph.
Algorithmic Flow
Input: Video frame, initial object frame
Output: Frames for each frame of video
Requirement: The successor frame is able to hold the original frame of the object
Steps:
Step 1: Detect all the feature points and feature descriptions of the initial video frame, not just the points in the box but the points of the entire image, so you can see in the code that the original database's feature description contains the foreground and background features.
Step 2: Assign the feature description in the initial box to the K1
Step 3: Starting from the second frame
Step 4: Detect feature points for video frames P
Step 5: Match the feature point P with O to get a matching feature point m
Step 6: Use the optical flow method to track the location of the feature points of this frame using the feature points of the previous frame T
Step 7: Fusion feature point M and feature point t get the total feature point of this frame K '
Step 8: Scale proportional to the initial frame characteristics of the feature points estimated by K '
Step 9: According to K ' estimate the rotation ratio of the feature point relative to the initial frame feature
Step 10: Calculate the vote of each feature point according to the data obtained from step 7,8,9
Step 11: Using clustering methods to select the largest class is also the most consistent Votec
Step 12: Convert the Votec back to the feature point to get the effective feature point of the last frame
Step 13: Determine if the length of the Votec is greater than the minimum threshold, and if so, the parameters of the last new rotated rectangle, if not the box is too small, then output 0
Step 1, 2 initialization
A clear understanding in CMT.cpp's code is the use of OpenCV's fast or brisk feature detection and feature description. Then the key is to store valid data in the database for subsequent matches.
Step 3,4,5,6 Analysis
Tracking and matching, the basic idea is to first use the previous frame of the feature point Points_prev through the optical flow calculation of the corresponding position of the frame points_tracked, and then in turn using points_tracked to calculate the corresponding position points_back the previous frame, Then compare the distance between Points_prev and Points_back, according to the truth should be close to 0, but because the optical flow calculation error, therefore, some may be larger, the author set a threshold of THR_FB 30, if greater than the threshold value, indicating that the resulting data is incorrect, delete the point. This is done to make the results of the trace more reliable.
Capture the feature points of this frame by tracking and feature matching, merging traces and matching points together
Step 8, 9 estimated zoom ratio and rotation angle
The initial feature points are stored at the beginning, and are the regularization feature point points_normalized, which calculates the relative distance and the relative angle between 22. For new feature points, they also calculate their relative distances and relative angles, and divide or subtract from the initial data, and they are changed. Finally take their median as the overall zoom ratio and rotation angle.
Step 10, 11, 12, 13 remove the bad feature points
The basic idea of vote is that the relative distance of these characteristic points relative to the center is relatively constant after taking into account the scaling and rotation, that is to say, the distance of the relative center of the characteristic point of the next frame is invariable. But because the image itself changes, it is impossible to get exactly the same relative position, at this time, some will be near the center, some will deviate greatly.
Evaluation of CMT algorithm
Disadvantages:
1) without the new model of the process, resulting in the object angle changes when the feature point is not found;
2) Too much feature points will cause slow speed;
3) The feature point is not tracked;
4) Moving objects in many cases, the characteristics of the point will be changed, it is easy to lead to tracking;
A little
1) code is simple, C + + and Python have, using OPENCV implementation;
2) The speed is still relatively fast;
3) The tracking effect of special static objects is almost perfect;