Original from http://blog.csdn.net/roamer_nuptgczx/article/details/47953357
Introduction
Recently, in the process of learning target tracking algorithm, it is found that the code of CMT algorithm is very good. Compared to the previously learned SCM and other based on sparse representation of the tracking algorithm, the robustness of CMT is not necessarily higher, but the sparse representation of the method is generally time-consuming is serious, resulting in it can not be applied to the actual project, and CMT is able to take into account the real-time and robustness.
Consult the data found that CMT corresponding to the paper has won the 2014 WACV Conference of the best Paper award. The author then further perfected the algorithm, and published the relevant papers on CVPR2015, so it seems that CMT algorithm is worthy of research. It is commendable that the author has published the complete algorithm source code on the paper home page http://www.gnebehay.com/cmt/, which contains all of C + + and Python all the codes. Because do not understand python, these days carefully read the C + + source code, have to say, the author's style is very standardized, comments are written in very detailed, read up hearty. In addition, the homepage of this site also has the author's own profile, in fact, he is the author of Opentld code, this shows the depth of its programming skills.
By the way, previously downloaded from the website of C + + source, may compile the M_PI not declared and int s[2*n-1] N is not a constant, such as errors, now the author has modified the source of GitHub on these small errors, we download the latest version of the source code on the line. In addition, the author in the Code project to add a very friendly command line parameter parsing code, as to how to use webcam or video or sequence test, the source folder has detailed instructions, not to repeat.
Combined with the paper, understanding the source of CMT is not too difficult, the author in the code to do some engineering treatment, mainly reflected in the key points of two matches and fusion, looks simple but very effective. Here I use Visio to draw a flowchart of the entire algorithm, in order to more clearly understand the core idea of CMT algorithm and concrete implementation method.
void CMT: Processing flow for:p rocessframe (Mat im_gray) function
Source Summary
All the functions of the CMT algorithm are implemented in the CMT class, which consists of 4 large components, packaged into 4 classes: Tracker, Matcher, Consensus, Fusion. In addition, the CMT class also includes fast detector and brisk descriptor.
tracker– using the Pyramid LK Optical Flow method
<code class= "Language-cpp hljs has-numbering" style= "display:block; padding:0px; Color:inherit; Box-sizing:border-box; font-family: ' Source Code Pro ', monospace;font-size:undefined; White-space:pre; border-radius:0px; Word-wrap:normal; background:transparent; " >tracker.track (Im_prev, Im_gray, points_active, points_tracked, status); </code><ul class= "pre-numbering "Style=" Box-sizing:border-box; Position:absolute; width:50px; top:0px; left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li></ul><ul class= "pre-numbering" style= "Box-sizing:border-box; Position:absolute; width:50px; top:0px; left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221,221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li></ul>
The valid key points in the t-1 frame are known points_active, by calculating the forward optical flow (t-1 frame to T frame) and the back light flow (frame T to frame t-1), and then comparing the distance between the corresponding key points obtained twice, and the key points with distance greater than the threshold are excluded, The remaining key points are the key points to follow. matcher– using "bruteforce-hamming"-type feature description matching device
Matcher Initialization
<code class= "Language-cpp hljs has-numbering" style= "display:block; padding:0px; Color:inherit; Box-sizing:border-box; font-family: ' Source Code Pro ', monospace;font-size:undefined; White-space:pre; border-radius:0px; Word-wrap:normal; background:transparent; " >matcher.initialize (points_normalized, DESCS_FG, CLASSES_FG, DESCS_BG, center); </code><ul class= " Pre-numbering "style=" Box-sizing:border-box; Position:absolute; width:50px; top:0px; left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li></ul><ul class= "pre-numbering" style= "Box-sizing:border-box; Position:absolute; width:50px; top:0px; left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li></ul>
The feature description of all foreground and background keys in the first frame is constructed into a feature description library database. Note that the feature description of the background key is stored in front of the DESC_BG, and then when the database_potential is constructed, the storage foreground key index indices_potential need to add the total number of background key points num_bg_points.
key point Global match
<code class= "Language-cpp hljs has-numbering" style= "display:block; padding:0px; Color:inherit; Box-sizing:border-box; font-family: ' Source Code Pro ', monospace;font-size:undefined; White-space:pre; border-radius:0px; Word-wrap:normal; background:transparent; " >matcher.matchglobal (keypoints, descriptors, Points_matched_global, Classes_matched_global);</code>< UL class= "pre-numbering" style= "Box-sizing:border-box; Position:absolute; width:50px; top:0px; left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li></ul><ul class= "pre-numbering" style= "Box-sizing:border-box; Position:absolute; width:50px; top:0px; left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-cOlor:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li></ul>
The feature description of all the key points obtained by the detector in the current frame is knnmatch matched with database (k=2), and each feature descriptor finds the best 2 matching results in database, and excludes the matching key points that match one of the following conditions: matching to the background key; The matching distance of the best match is greater than the threshold 0.25; the match distance between best match and sub-good match is greater than the threshold 0.8 (the smaller the ratio, the better match is better than the second best match)
key point Local match
<code class= "Language-cpp hljs has-numbering" style= "display:block; padding:0px; Color:inherit; Box-sizing:border-box; font-family: ' Source Code Pro ', monospace;font-size:undefined; White-space:pre; border-radius:0px; Word-wrap:normal; background:transparent; " >matcher.matchlocal (keypoints, descriptors, center, scale, rotation,points_matched_local, classes_matched_local) ; </code><ul class= "pre-numbering" style= "Box-sizing:border-box; Position:absolute; width:50px; top:0px; left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li><li style= "Box-sizing:border-box; padding:0px 5px; " >2</li></ul><ul class= "pre-numbering" style= "Box-sizing:border-box; Position:absolute; width:50px; top:0px; left:0px; Margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li><li style= "Box-sizing:border-box; padding:0px 5px; " >2</li></ul>
Compares the Euclidean distance between each key detected in the current frame and all foreground keys after the rotation and scale transformations in the first frame, which is less than the threshold of 20, the foreground key is likely to match, and these possible foreground keys are constructed as a feature description library database_potential , the characteristic description of each key detected in the current frame is knnmatch matched with database_potential, and each feature descriptor finds the best 2 matching results in database_potential. Policies that rule out unstable keys are similar to Matchglobal.
consistency constraints of consensus– target key points
consensus initialization
<code class= "Language-cpp hljs has-numbering" style= "display:block; padding:0px; Color:inherit; Box-sizing:border-box; font-family: ' Source Code Pro ', monospace;font-size:undefined; White-space:pre; border-radius:0px; Word-wrap:normal; background:transparent; " >consensus.initialize (points_normalized); </code><ul class= "pre-numbering" style= "box-sizing: Border-box; Position:absolute; width:50px; top:0px; left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li></ul><ul class= "pre-numbering" style= "Box-sizing:border-box; Position:absolute; width:50px; top:0px; left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-alIgn:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li></ul>
The
calculates and saves the distance distances_pairwise between all normalized foreground keys in the first frame points_normalized the angle of the x-axis (ARC-tangent) angles_pairwise and the key point 22. &NBSP
evaluates the current rotation angle and scale factor
<code class= "Language-cpp hljs has-numbering" style= "display:block; padding:0px; Color:inherit; Box-sizing:border-box; font-family: ' Source Code Pro ', monospace;font-size:undefined; White-space:pre; border-radius:0px; Word-wrap:normal; background:transparent; " >consensus.estimatescalerotation (points_fused, classes_fused, scale, rotation); </code><ul class= " Pre-numbering "style=" Box-sizing:border-box; Position:absolute; width:50px; top:0px; left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li></ul><ul class= "pre-numbering" style= "Box-sizing:border-box; Position:absolute; width:50px; top:0px; left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li></ul>
Calculate the angle of the key points after Matchglobal match and the distance between key points 22, and find the difference between the angle of the corresponding points_normalized key points, the distance quotient, and then take the average value respectively. Evaluates the scaling factor scale and rotation angle rotation of the target in the current frame.
get target location and inliers key points
<code class= "Language-cpp hljs has-numbering" style= "display:block; padding:0px; Color:inherit; Box-sizing:border-box; font-family: ' Source Code Pro ', monospace;font-size:undefined; White-space:pre; border-radius:0px; Word-wrap:normal; background:transparent; " >consensus.findconsensus (points_fused, classes_fused, scale, rotation, center, points_inlier, Classes_inlier); </code><ul class= "pre-numbering" style= "Box-sizing:border-box; Position:absolute; width:50px; top:0px; left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li><li style= "Box-sizing:border-box; padding:0px 5px; " >2</li></ul><ul class= "pre-numbering" style= "Box-sizing:border-box; Position:absolute; width:50px; top:0px; left:0px; Margin0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li><li style= "Box-sizing:border-box; padding:0px 5px; " >2</li></ul>
Calculate the matchglobal of the key points after the match, that is, each key point and corresponding after the scale and rotation of the points_normalized key points formed between the vector; calculates the distance between 22 votes (vectors in 1), ascending by the size of the distance; Clustering and obtaining the largest class in the result, (when the distance between two classes is less than the threshold, merging the two classes) takes all the keys in this class as points_inlier, and the coordinate mean of all points_inlier keys as the center point of the target.
fusion– fusion of two key points without duplication
initial convergence of key points
<code class= "Language-cpp hljs has-numbering" style= "display:block; padding:0px; Color:inherit; Box-sizing:border-box; font-family: ' Source Code Pro ', monospace;font-size:undefined; White-space:pre; border-radius:0px; Word-wrap:normal; background:transparent; " >fusion.preferfirst (points_tracked, classes_tracked, Points_matched_global, Classes_matched_global, Points_ Fused, classes_fused); </code><ul class= "pre-numbering" style= "Box-sizing:border-box; Position:absolute; width:50px; top:0px; left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li><li style= "Box-sizing:border-box; padding:0px 5px; " >2</li></ul><ul class= "pre-numbering" style= "Box-sizing:border-box; Position:absolute; width:50px; top:0px;left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li><li style= "Box-sizing:border-box; padding:0px 5px; " >2</li></ul>
The key points that the optical flow is traced to are fused with the key points that the Matchglobal match to, and the keys are used to evaluate the rotation angle and scale of the target and to poll for the target center position.
key points two times fusion
<code class= "Language-cpp hljs has-numbering" style= "display:block; padding:0px; Color:inherit; Box-sizing:border-box; font-family: ' Source Code Pro ', monospace;font-size:undefined; White-space:pre; border-radius:0px; Word-wrap:normal; background:transparent; " >fusion.preferfirst (points_matched_local, classes_matched_local, Points_inlier, Classes_inlier, Points_active, classes_active); </code><ul class= "pre-numbering" style= "Box-sizing:border-box; Position:absolute; width:50px; top:0px; left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li><li style= "Box-sizing:border-box; padding:0px 5px; " >2</li></ul><ul class= "pre-numbering" style= "Box-sizing:border-box; Position:absolute; width:50px; top:0px; Left:0px; margin:0px; padding:6px 0px 40px; border-right-width:1px; Border-right-style:solid; Border-right-color:rgb (221, 221, 221); List-style:none; Text-align:right; Background-color:rgb (238, 238, 238); " ><li style= "Box-sizing:border-box; padding:0px 5px; " >1</li><li style= "Box-sizing:border-box; padding:0px 5px; " >2</li></ul>
The key points matched to the matchlocal are fused with the Inliers key points to obtain the final effective target key points_active, which are used for the tracking of the next frame.