TLD Target Tracking Algorithm learning (c) Code understanding

Source: Internet
Author: User
Tags pow prepare tld


Transferred from: http://blog.csdn.net/zouxy09/article/details/7893026

From the main () function, analyze the entire TLD operation as follows:8 stages
(This is just an analysis of the work process, all the annotated Code see blog update)

1, the analysis program runs the command line parameter;
./run_tld-p. /parameters.yml-s. /datasets/06_car/car.mpg-b. /datasets/06_car/init.txt–r

2, read into the initialization parameters (variables in the program) file parameters.yml;


3, through the file or the user mouse box select the way to specify the target to be tracked bounding box;

4. To initialize the TLD system with the bounding box and first frame image obtained above to track the target
Tld.init (Last_gray, box, bb_file); The initialization contains the following work:

4.1, Buildgrid (frame1, Box);
The detector adopts the strategy of Scanning window: Scan window step is 10% width, scale scale factor is 1.2; This function constructs all the scan window grids and calculates the overlap between each scan window and the target box of the input, and the overlap is defined as the ratio of the intersection of two box to their set;

4.2, for a variety of variables or containers to allocate memory space;

4.3, getoverlappingboxes (box, num_closest_init);
This function, based on the incoming box (the target bounding box), finds the Num_closest_init (10) window with the smallest box distance (that is, the most similar, most overlapping) in all scan windows in the entire frame image, and then puts the windows into Good_ Boxes container. At the same time, the overlap degree of less than 0.2 is classified into the Bad_boxes container, which is equivalent to filtering all scan windows. The maximum boundary of these scan Windows is obtained through the Bbhull function.


4.5, Classifier.prepare (scales);
To prepare the classifier, the scales container is the scale of all scan windows, initialized by the Buildgrid () function above;
The TLD classifier has three parts: the variance classifier module, the collection classifier module and the nearest neighbor classifier module; The three classifiers are cascaded, and each scan window passes through the three classifiers in turn, which is considered to contain a foreground target. Here prepare this function is mainly initializes the set classifier module;
The set classifier (random forest/random fern) is based on N basic classifiers (a total of 10 trees), each classifier (tree) is based on a pixel comparisons (a total of 13 pixel comparison sets), that is, each tree has 13 judgment nodes (composed of a pixel Comparisons), the input image slice is compared with each judgment node (corresponding pixel), resulting in 0 or 1, and then the 13 0 or 1 is connected to a 13-bit binary x (with 2^13), each x corresponds to a posteriori probability p (y|x) = #p/(#p + #n) (There are also 2^13), #p和 #n is the number of positive and negative image slices respectively. Then the whole set of classifiers (a total of 10 basic classifiers) There are 10 posterior probabilities, 10 posterior probabilities are averaged, if greater than the threshold value (first set experience value 0.65, the back of the training optimization), it is considered that the image film contains a foreground target;
The posterior probability p (y|x) = #p/(#p + #n) Generation method: When initialized, each posterior probability is initialized to 0; The runtime is updated in the following way: The sample of the known category label (Training sample) is classified by n classifier, if the classification result is wrong, then the corresponding #p and #n will be updated. So the P (y|x) is updated accordingly.
Pixel Comparisons Generation method: First with a normalized patch to disperse the pixel space, producing all possible vertical and horizontal pixel comparisons, and then we assign these pixel comparisons randomly to n classifiers, Each classifier gets a completely different pixel comparisons (feature set), so that the feature groups of all the classifiers can be unified to cover the entire patch.
The feature is relative to a rectangular box of a scale, the first and second features of the features[s][i in the TLD (s) = Feature (x1, y1, x2, y2), which are two randomly assigned pixel coordinates (that is, 0 or 1 are compared by the two pixels). Each scale scan window contains totalfeatures = nstructs * StructSize features; nstructs is a tree (constructed from a feature group, each set of features represents a different view of the image block) StructSize is the number of characters per tree, that is, the number of judging nodes per tree; each feature in a tree is a decision-making node;
The work of the prepare function is to initialize the corresponding pixel comparisons (two randomly assigned pixel coordinates) for each scan window, and then initialize the posterior probability to 0;


4.6, Generatepositivedata (frame1, num_warps_init);
This function synthesizes the positive sample set of the training initial classifier by affine transformation of the target box of the first frame image (the user-specified target to be traced). Here's how to do this: select 10 bounding box (already from the getoverlappingboxes function above) in the closest scan window to the initial target box, and remember not. ), and then within each bounding box, the ±1% range is offset, the scale of the ±1% range changes, the level of the ±10% range is rotated, and the Gaussian noise with a variance of 5 is added to each pixel (the exact size is randomly selected within the specified range), So each box has 20 of these geometric transformations, then 10 boxes will produce 200 affine transforms of the bounding box, as a positive sample. The specific implementation is as follows:
Getpattern (FRAME (best_box), pEx, mean, STDEV); This function converts the image slice of the frame image Best_box area to a patch of 15*15 size with a mean value of 0. stored in the positive sample of PEX (a positive sample for nearest neighbor classifier) (the pattern of the closest box), the positive sample is only one.
Generator (frame, PT, warped, Bbhull.size (), rng); This function belongs to the constructor of the Patchgenerator class, which is used for affine transformation of the image area, first rng a random factor, The call () operator then produces a transformed positive sample.
Classifier.getfeatures (Patch, GRID[IDX].SIDX, fern); function gets the feature of the input patch fern (13-bit binary code);
Px.push_back (Make_pair (fern, 1)); Positive Ferns <features, labels=1> is then labeled as a positive sample, deposited into the PX (positive sample for the collection classifier) positive sample library;
The above operation will cycle Num_warps * good_boxes.size () that is 20 * 10 times, so that PEX has a positive sample, and PX has 200 positive samples;


4.7, Meanstddev (Frame1 (Best_box), mean, STDEV);
Statistical best_box mean and standard deviation, var = pow (stdev.val[0],2) * 0.5; As a threshold for variance classifier.

4.8, Generatenegativedata (FRAME1);

Since the TLD tracks only one target, we have determined the target frame, so that all other images except the target frame are negative samples, without affine transformations, as follows:
Because before the overlap degree of less than 0.2, are classified into bad_boxes, so the number is quite many, the variance is greater than var*0.5f bad_boxes are added negative samples, as above, need classifier.getfeatures (patch, GRID[IDX ].sidx, Fern), and Nx.push_back (Make_pair (fern, 0)), and obtain the corresponding fern features and labels of the NX negative sample (for a negative sample of the set classifier);
Then randomly take bad_patches (100) boxes in the above bad_boxes, and then use the Getpattern function to convert the image slices of the frame image Bad_box area to the 15*15 size patch. There is a negative sample of the NEX (for a negative sample for the nearest neighbor classifier).
So both NEX and NX have negative samples; (Box's variance is calculated by integral image)

4.9, then half of the NEX as the training set NEX, the other half as the test set next; NX is also split into training set NX and test set NXT;


4.10, the negative sample NX and the positive sample PX merged into ferns_data[], for the training of the collection classifier;

4.11, the above obtained a positive sample of PEX and NEX merged into nn_data[], for the nearest neighbor classifier training;

4.12. Train the collection classifier (forest) and nearest neighbor classifier with the sample training set above:
Classifier.trainf (Ferns_data, 2); Bootstrap = 2
For each sample ferns_data[i], if the sample is a positive sample label, first use the Measure_forest function to return all the eigenvalues of the sample all the values corresponding to the posterior probability accumulation value, if the accumulated value is less than the positive sample threshold (0.6* nstructs, This means that the average value needs to be greater than 0.6 (0.6* nstructs/nstructs), 0.6 is the threshold of the set classifier at the time of program initialization, the empirical value, the test set is used to evaluate the modification, find the optimal), that is, the input is a positive sample, but is classified as a negative sample, A classification error occurred, so the sample was added to the positive sample library, and the posteriori probability was updated with the update function. For negative samples, similarly, if a negative sample classification error occurs, it is added to the Negative sample library.
CLASSIFIER.TRAINNN (Nn_data);
For each sample nn_data, if the label is a positive sample, by Nnconf (Nn_examples[i], isin, conf, dummy), calculate the correlation similarity between the input image slice and the online model conf, if the correlation similarity is less than 0.65, It is considered that it does not contain a foreground target, that is, the classification is wrong, then it is added to the positive sample library. The sample is then added to the Pex positive sample library by Pex.push_back (Nn_examples[i]), and if a negative sample classification error occurs, it is added to the Negative sample library.


4.13, using the test set in the above obtained collection classifier (forest) and nearest neighbor classifier classification, evaluation and modification to get the best classifier threshold value.
Classifier.evaluateth (NXT, NExT);

For the set classifier, for each test set NXT, the average of the posterior probabilities of all the basic classifiers if it is greater than Thr_fern (0.6) is considered to contain a foreground target and then the maximum mean (greater than thr_fern) as a new threshold for the set classifier.
For the nearest neighbor classifier, for each test set next, the maximum correlation similarity, if greater than Nn_fern (0.65), is considered to contain a foreground target, then the maximum maximum correlation similarity (greater than Nn_fern) is assumed as a new threshold for the nearest neighbor classifier.


5, enter a cycle: Read into a new frame, and then convert to grayscale image, and then process each frame processframe;

6, Processframe (Last_gray, Current_gray, Pts1, Pts2, Pbox, status, TL, bb_file), read the image sequence by frame, and perform algorithm processing. Processframe consists of four modules (processed sequentially): Tracking module, detection module, integrated module and learning module;


6.1, Tracking module: track (IMG1, Img2, Points1, points2);
The track function completes the tracking prediction of the feature point of the previous frame img1 points1 to the characteristic point of the current frame Img2 points2;

6.1.1, the specific implementation process is as follows:

(1) First in the Lastbox evenly sampled 10*10=100 feature points (grid uniform scatter point), stored in Points1:
Bbpoints (Points1, Lastbox);

(2) Use the pyramid LK Optical flow method to track these feature points and predict the feature points of the current frame (see explanation below), calculate FB error and match similarity Sim, then filter out fb_error[i] <= median (fb_error) and Sim_error[i] > Median (sim_error) feature points (discarding the feature points with poor tracking results), leaving less than 50% of the feature points:
TRACKER.TRACKF2F (IMG1, Img2, points, points2);

(3) Use the remaining less than half of the tracking point inputs to predict the position and size of the bounding box in the current frame TBB:
Bbpredict (points, points2, Lastbox, TBB);

(4) Tracking failure detection: If the median value of FB error is greater than 10 pixels (XP), or the position of the current box predicted to move out of the image, it is considered that the tracking error, at this time do not return bounding box:
if (TRACKER.GETFB () >10 | | tbb.x>img2.cols | | Tbb.y>img2.rows | | tbb.br (). x < 1 | | tbb.br (). Y <1)

(5) The size of the patch corresponding to the normalized Img2 (BB) (reduced to patch_size = 15*15), stored in pattern:
Getpattern (Img2 (BB), Pattern,mean,stdev);

(6) Calculating the conservative similarity of the image pattern to the online model m:
Classifier. Nnconf (pattern,isin,dummy,tconf);

(7) If the conservative similarity is greater than the threshold value, the evaluation of this trace is valid, otherwise the trace is invalid:
if (tconf>classifier.thr_nn_valid) Tvalid =true;

The implementation principle of 6.1.2 and TLD tracking module and the realization of TRACKF2F function:
The implementation of TLD tracking module is based on the combination of the media flow median optical flow tracking and tracking error detection algorithm. The median stream tracking method is based on Forward-backward error and NNC. The principle is simple: from the T-moment image of point A, trace to the t+1 moment of the image B-point, and then back, from the t+1 moment of the image of the B-point tracking, if the tracking to the t moment of the image of the C point, so that the forward and backward two trajectories, compare T moment of A and C points distance, if the distance is less than , then it is believed that the forward tracking is correct; this distance is fb_error;
BOOL LKTRACKER::TRACKF2F (const mat& IMG1, const mat& IMG2, vector<point2f> &points1, VECTOR&LT;CV:: Point2f> &points2)
The function implementation process is as follows:
(1) using the pyramid LK Optical flow method to track forward trajectory:
Calcopticalflowpyrlk (Img1,img2, Points1, points2, status, similarity, window_size, level, Term_criteria, lambda, 0);

(2) Back tracking, to produce a backward trajectory:
Calcopticalflowpyrlk (IMG2,IMG1, Points2, POINTSFB, Fb_status,fb_error, window_size, level, Term_criteria, lambda, 0);

(3) then calculate the Fb-error: the error of the forward and the back trajectory:
for (int i= 0; i<points1.size (); ++i)
Fb_error[i] = Norm (Pointsfb[i]-points1[i]);

(4) from the previous frame and the current frame image (centered on each feature point) using subpixel precision extraction 10x10 pixel rectangle (using the function Getrectsubpix), matching the previous frame and the current frame extracted 10x10 pixel rectangle, Get a matching map image (call matchtemplate) to get the NCC correlation coefficients (i.e. similarity size) for each point.
Normcrosscorrelation (IMG1, Img2, Points1, points2);

(5) Then filtering out the feature points of Fb_error[i] <= median (fb_error) and sim_error[i] > Median (sim_error) (discarding the feature points with poor tracking results), leaving less than 50% feature points;
Filterpts (Points1, points2);


6.2. Detection module: Detect (IMG2)
The TLD's detection classifier has three parts: the variance classifier module, the collection classifier module and the nearest neighbor classifier module; The three classifiers are cascaded. Each scan window of the current frame Img2 passes through the above three classifiers in turn, all of which are considered to contain the foreground target. The specific implementation process is as follows:
Calculate the Img2 's integral graph first, in order to calculate the variance more quickly:
Integral (frame,iisum,iisqsum);
Then use Gaussian blur to de-noising:
Gaussianblur (Frame,img,size (9,9), 1.5);
The next step is to enter the variance detection module:

6.2.1, Variance classifier module: GetVar (grid[i],iisum,iisqsum) >= var
Using the integral graph to calculate the variance of each window to be detected, the variance is greater than the Var threshold (50% of the target patch variance), it is considered to contain the foreground target, through which the module enters the collection classifier module:

6.2.2, set classifier module:
Collection classifier (random forest) There are 10 trees (basic classifier), each tree 13 judgment node, each judgment node is compared to get a bits 0 or 1, so that each tree corresponds to a 13-bit binary code x (leaf), this binary code x corresponds to a posteriori probability P (y|x). Then the whole set of classifiers (a total of 10 basic classifiers) There are 10 posterior probabilities, the 10 posterior probability is averaged, if the threshold is greater than (the first set of experience value 0.65, the back of the training optimization), it is considered that the image contains a foreground target, the specific process is as follows:

(1) First get the feature value of the patch (13-bit binary code):
Classifier.getfeatures (Patch,grid[i].sidx,ferns);

(2) Calculate the cumulative value of the posterior probability corresponding to the eigenvalue value:
conf = Classifier.measure_forest (ferns);

(3) If the average value of the posterior probability of the set classifier is greater than the threshold fern_th (obtained by training), it is considered to contain a foreground target:
if (conf > numtrees * fern_th) dt.bb.push_back (i);

(4) The scanning window of the above two detection modules is recorded in the Detect structure;

(5) If the number of scanning windows of the above two detection modules is more than 100, then only the first 100 with a large posteriori probability are taken;
Nth_element (Dt.bb.begin (), Dt.bb.begin () +100, Dt.bb.end (),
Ccomparator (tmp.conf));
Enter nearest neighbor classifier:

6.2.3, nearest neighbor classifier module

(1) First the size of the patch (Patch_size = 15*15), deposited into the dt.patch[i];
Getpattern (Patch,dt.patch[i],mean,stdev);

(2) Calculate the correlation similarity and conservative similarity of the pattern of the image to the online model m:
Classifier. Nnconf (Dt.patch[i],dt.isin[i],dt.conf1[i],dt.conf2[i]);

(3) The correlation similarity is higher than the threshold value, it is considered to contain the foreground target:
if (dt.conf1[i]>nn_th) Dbb.push_back (Grid[idx]);
So far, the detector detection is complete, all through the three detection Module scanning window exists in the DBB;


6.3, Integrated module:
TLD tracks only single targets, so the integrated module integrated tracker tracks multiple targets that a single target and detector may detect, and then outputs only one target with the most conservative similarity. The implementation process is as follows:

(1) The bounding box is clustered by the level of overlap for the target detected by the detector, and the overlap of each class is less than 0.5:
clusterconf (DBB, dconf, CBB, cconf);

(2) then find the class that is farther from the box distance tracked by the tracker (the box detected by the detector), and its correlation similarity is larger than that of the tracker: The record satisfies the above conditions, that is, the number of the target box with high reliability:
if (Bboverlap (TBB, CBB [i]) <0.5 && cconf[i]>tconf) confident_detections++;

(3) to determine if there is only one box that satisfies the above criteria, then use this target box to reinitialize the tracker (that is, to correct the tracker with the result of the detector):
if (confident_detections==1)   BBNEXT=CBB[DIDX];

(4) If there is more than one box that satisfies the above criteria, then find the box that the detector detected is close to the box distance predicted by the tracker (overlap greater than 0.7), and the coordinates and size of the box are summed:
if (Bboverlap (TBB, Dbb[i]) >0.7)   CX + + dbb[i].x;

(5) The box and the tracker itself, which are close to the box distance predicted by the tracker, bounding box for the average coordinate and size of the box, but the tracker has a larger weight:
Bbnext.x = Cvround (( float) (10*TBB.X+CX)/(float) (10+close_detections));

(6) In addition, if the tracker does not track the target, but the detector detects some possible target boxes, it also clusters it, but simply cbb[0 the cluster as the new tracking target box (not comparable). Or is it already lined up? ), reinitialize the tracker:
Bbnext=cbb[0];
At this point, the composite module ends.


6.4, Learning module: Learn (IMG2);

The learning module is also divided into the following four parts:

6.4.1, check consistency:

(1) The size of the patch for the normalized img (BB) (indented to Patch_size = 15*15), stored in pattern:

Getpattern (IMG (BB), pattern, mean, STDEV);

(2) Calculate the correlation similarity between the input image slice (the target box of the tracker) and the online model conf:

Classifier. Nnconf (Pattern,isin,conf,dummy);

(3) If the similarity is too small or if the variance is too small or if it is recognized as a negative sample, then it is not trained;

if (conf<0.5) ... Or if (POW (stdev.val[0], 2) < Var) ... or if (isin[2]==1) ...

6.4.2, generating samples:
First a sample of the collection classifier: fern_examples:

(1) First calculate the overlap of all scan windows with the current target box:
Grid[i].overlap = Bboverlap (Lastbox, grid[i]);

(2) According to the incoming Lastbox, in the whole frame of the image of the entire window looking for the Lastbox distance (that is most similar, the largest overlap) of the Num_closest_update window, and then put these windows into the Good_ Boxes container (just deposit the index of the grid array), and put the overlap of less than 0.2, into the bad_boxes container:
Getoverlappingboxes (Lastbox, num_closest_update);

(3) Then use the affine model to produce a positive sample (similar to the first frame of the method, but only to produce 10*10=100):
Generatepositivedata (IMG, num_warps_update);

(4) A negative sample is added and the similarity is greater than 1. The similarity is not between 0 and 1.
Idx=bad_boxes[i];
if (tmp.conf[idx]>=1) Fern_examples.push_back (Make_pair (tmp.patt[idx],0));
Then a sample of the nearest neighbor classifier: nn_examples:
if (Bboverlap (LASTBOX,GRID[IDX) < Bad_overlap)
Nn_examples.push_back (Dt.patch[i]);


6.4.3, classifier Training:
Classifier.trainf (fern_examples,2);
CLASSIFIER.TRAINNN (Nn_examples);

6.4.4, displaying all positive samples contained in the positive sample library (online model) on the window
Classifier.show ();

At this point, the Tld.processframe function ends.


7, if the tracking is successful, then the corresponding point and box to draw out:
if (status) {
Drawpoints (FRAME,PTS1);
Drawpoints (Frame,pts2,scalar (0,255,0)); The current feature point is represented by a blue dot
Drawbox (Frame,pbox);

detections++;
}

8, then display the window and swap the image frame, into the next frame processing:

Imshow ("TLD", frame);
Swap (Last_gray, current_gray);

At this point, the main () function ends (only the frame is analyzed).




Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.