The entire process of TLD code analysis

Source: Internet
Author: User
Tags prepare tld
Main () function Analysis

Reference post: http://blog.csdn.net/zouxy09/article/details/7893056

From the main () function, analysis of the entire TLD operation process is as follows: 1, the analysis of the command line parameters of the program run;

./run_tld-p. /parameters.yml-s. /datasets/06_car/car.mpg-b. /datasets/06_car/init.txt–r 2, read into the initialization parameters (variables in the program) file Parameters.yml 3. Specify the bounding of the target to be tracked by file or by user mouse box selection box; 4. To initialize the TLD system with the bounding box and first frame image obtained above to track the target

Tld.init (Last_gray, box, bb_file); The initialization contains the following work: 4.1, Buildgrid (frame1, box);

The detector adopts the strategy of scanning window: The scanning window step is 10% of the width of the logo window, the scale scale factor is 1.2; This function constructs the entire scan window grid and calculates the overlap between each scan window and the target box of the input, and the overlap is defined as the ratio of the intersection of two box to their set;

4.2, for a variety of variables or containers to allocate memory space; 4.3, getoverlappingboxes (box, num_closest_init);

This function, based on the incoming box (the target bounding box), finds the Num_closest_init (10) window with the smallest box distance (that is, the most similar, most overlapping) in all scan windows in the entire frame image, and then puts the windows into Good_ Boxes container. At the same time, the overlap degree of less than 0.2 is classified into the Bad_boxes container, which is equivalent to filtering all scan windows. The maximum boundary of these scan Windows (GOOD_BOXS) is obtained through the Bbhull function.

4.5, Classifier.prepare (scales);

To prepare the classifier, the scales (size type) container is the scale of all scan Windows (21), initialized by the Buildgrid () function above;

The TLD classifier has three parts: the variance classifier module, the collection classifier module and the nearest neighbor classifier module; The three classifiers are cascaded, and each scan window passes through the three classifiers in turn, which is considered to contain a foreground target. Here prepare this function is mainly initializes the set classifier module;

The collection classifier (random forest) is based on N basic classifiers (a total of 10 trees), each classifier (tree) is based on a pixelcomparisons (a total of 13 pixel comparison sets), that is, each tree has 13 judgment nodes (composed of a pixelcomparisons), The input image is compared with each judgment node (corresponding pixel), resulting in 0 or 1, and then the 13 0 or 1 are connected to a 13-bit binary x (with 2^13), each x corresponds to a posteriori probability p (y|x) = #p/(#p + #n) (there are also 2^13 possibilities), # P and #n are the number of positive and negative image slices respectively. Then the whole set of classifiers (a total of 10 basic classifiers) There are 10 posterior probabilities, 10 posterior probabilities are averaged, if greater than the threshold value (first set experience value 0.65, the back of the training optimization), it is considered that the image film contains a foreground target;

The posterior probability p (y|x) = #p/(#p + #n) Generation method: When initialized, each posterior probability is initialized to 0; The runtime is updated in the following way: The sample of the known category label (Training sample) is classified by n classifier, if the classification result is wrong, then the corresponding #p and #n will be updated. So the P (y|x) is updated accordingly.

Pixelcomparisons Generation method: First with a normalized patch to disperse the pixel space, generating all possible vertical and horizontal pixelcomparisons, and then we put these pixelcomparisons randomly assigned to n classifiers, Each classifier gets a completely different pixelcomparisons (feature set) so that the feature group of all classifiers can be unified to cover the entire patch.

The feature (Fernfeature std::vector for every scale) is in relation to a rectangular box of scales, and the first and second features of the first S scale of the TLD are features[s][i]= Feature (x1, y1, x2, y2); Is the two randomly assigned (SCALES*RNG ()) pixel coordinates (which are 0 or 1 compared to the two pixels). Scanning windows of each scale contain totalfeatures = nstructs* structsize features; Nstructs is a tree (constructed from a feature group, each set of features represents a different view of the image block) StructSize is the number of characters per tree, that is, the number of judging nodes per tree; each feature in a tree is a decision-making node;

The job of the prepare function is to initialize the corresponding pixelcomparisons (two randomly assigned pixel coordinates) for each scan window, and then initialize the posteriori probability to 0;

Posteriors, Pcounter, ncounter number of n rows (10 basic classifiers), 8192 (2.^13) values per dimension

4.6, Generatepositivedata (frame1, num_warps_init);

This function synthesizes the positive sample set of the training initial classifier by affine transformation of the target box of the first frame image (the user-specified target to be traced). Here's how to do this: select 10 BoundingBox (already from the above getoverlappingboxes function, stored in good_boxes) in the closest scan window to the initial target box, and then within each boundingbox, The ±1% range is offset, the scale of the ±1% range is changed, the level of the ±10% range is rotated, and the Gaussian noise with a variance of 5 is added to each pixel (the exact size is randomly selected within the specified range), then each box is 20 times this geometric transformation, Then 10 box will produce 200 affine transformations of the boundingbox, as a positive sample. The specific implementation is as follows:

Getpattern (FRAME (best_box), pEx, mean, STDEV); This function converts the image slice of the frame image Best_box area to a patch of 15*15 size with a mean value of 0. stored in the positive sample of PEX (a positive sample for nearest neighbor classifier) (the pattern of the closest box), the positive sample is only one. It also obtains the mean and variance in the current box, and is stored in the mean and Stdev respectively.

Generator (frame,pt, Warped, bbhull.size (), rng); This function belongs to the constructor of the Patchgenerator class, which is used to perform affine transformations on the image area, first rng a random factor, The call () operator then produces a transformed positive sample.

Classifier.getfeatures (Patch,grid[idx].sidx, fern); function gets the feature of the input patch fern (13 bit binary code) ; (I've already said how to get: The Prepare function is to initialize each scan window with the corresponding pixelcomparisons (two randomly assigned pixel coordinates), which is the size of the patch pixel values corresponding to the two points)

Px.push_back (Make_pair (fern,1)); Positive Ferns <features, labels=1> is then labeled as a positive sample, deposited into the PX (positive sample for the collection classifier) positive sample library;

The above operation will cycle Num_warps * good_boxes.size () that is 20 * 10 times, so that PEX has a positive sample, and PX has 200 positive samples;

The resulting PEX is an average of 0 of a Mat (15*15), and PX is 200, each 10 binary code after the conversion number 4.7, Meanstddev (Frame1 (Best_box), mean, STDEV);

Statistical best_box mean and standard deviation, var =pow (stdev.val[0],2) * 0.5; As a threshold for variance classifier.  The var global variable, which is half the variance of the Best_box window. 4.8, Generatenegativedata (frame1);

Random_shuffle (Bad_boxes.begin (), Bad_boxes.end ()); randomly disrupts the bad_boxes of the data.

Since the TLD tracks only one target, we have determined the target frame, so that all other images except the target frame are negative samples, without affine transformations, as follows:

Due to the previous overlap of less than 0.2, are classified into bad_boxes, so the number is quite many, the variance is greater than var*0.5f bad_boxes are added negative samples, as above, need Classifier.getfeatures (patch,grid[idx ].sidx, Fern), and Nx.push_back (Make_pair (fern,0)), and the corresponding fern features and labels of the NX negative sample (for a negative sample of the aggregate classifier);

Then randomly take bad_patches (100) boxes in the above bad_boxes, and then use the Getpattern function to convert the image slices of the frame image Bad_box area to the 15*15 size patch. There is a negative sample of the NEX (for a negative sample for the nearest neighbor classifier).

As can be seen here, the nEx (100) is also the same as PEX for 15*15 size of patch,nx (variance greater than var*0.5f, there are many, the number is indeterminate) and PX as the number of binary conversions.

So both NEX and NX have negative samples; (Box's variance is calculated by integral image)

4.9, then half of the NEX as the training set NEX, the other half as the test set next; NX is also split into training set NX and test set NXT;

4.10, the negative sample NX and the positive sample PX merged into ferns_data[], for the training of the collection classifier; 4.11, the above obtained a positive sample of PEX and NEX merged into nn_data[], for the nearest neighbor classifier training; 4.12. Train the collection classifier (forest) and nearest neighbor classifier with the sample training set above:

Classifier.trainf (Ferns_data, 2); Bootstrap = 2

For each sample ferns_data[i], if the sample is a positive sample label, first use the Measure_forest function to return all the eigenvalues of the sample all the values corresponding to the posterior probability accumulation value, if it is less than the positive sample threshold value (0.6*nstructs, This means that the average value needs to be greater than 0.6 (0.6*nstructs/nstructs), 0.6 is the threshold of the set classifier when the program is initialized, and for the empirical value, the test set will be used to evaluate the modification, to find the optimal), that is, the input is a positive sample, but is classified as a negative sample, A classification error occurred, so the sample was added to the positive sample library, and the posteriori probability was updated with the update function. For negative samples, similarly, if a negative sample classification error occurs, it is added to the Negative sample library.

CLASSIFIER.TRAINNN (Nn_data);

For each sample nn_data, if the label is a positive sample, through Nnconf (nn_examples[i],isin, conf, dummy); Calculates the correlation similarity between the input image slice and the online model conf, if the correlation similarity is less than 0.65, It is considered that it does not contain a foreground target, that is, the classification is wrong, then it is added to the positive sample library. The sample is then added to the Pex positive sample library by Pex.push_back (Nn_examples[i]), and if a negative sample classification error occurs, it is added to the Negative sample library.

The nnconf function calculates several similarity degrees in 5.2ObjectModel of TLD paper, such as correlation similarity and conservative similarity. The change of positive and negative samples is judged according to similarity degree. 4.13, using the test set in the above obtained collection classifier (forest) and nearest neighbor classifier classification, evaluation and modification to get the best classifier threshold value.

Classifier.evaluateth (Nxt,next);

For the set classifier, for each test set NXT, the average of the posterior probabilities of all the basic classifiers if it is greater than Thr_fern (0.6) is considered to contain a foreground target and then the maximum mean (greater than thr_fern) as a new threshold for the set classifier. Otherwise it won't change.

For the nearest neighbor classifier, for each test set next, the maximum correlation similarity, if greater than Nn_fern (0.65), is considered to contain a foreground target, then the maximum maximum correlation similarity (greater than Nn_fern) is assumed as a new threshold for the nearest neighbor classifier. Otherwise it won't change.

5, enter a cycle: Read into a new frame, and then convert to grayscale image, and then process each frame processframe; 6, Tld.processframe (Last_gray, Current_gray, Pts1, Pts2, Pbox, status, TL, bb_file), read the image sequence by frame, and perform algorithm processing. Processframe consists of four modules (processed sequentially): Tracking module, detection module, integrated module and Learning module , 6.1, tracking module:

Track (Img1,img2, Points1, points2);

The track function completes the tracking prediction of the feature point of the previous frame IMG1 points1 to the feature point points2 of the current frame img2; 6.1.1, the specific implementation process is as follows:

(1) First in the Lastbox evenly sampled 10*10=100 feature points (grid uniform scatter point), stored in Points1:

Bbpoints (Points1,lastbox);//box width and height are divided by 10

(2) Use the pyramid LK Optical flow method to track these feature points and predict the feature points of the current frame (see explanation below), calculate fberror and match similarity Sim, then filter out Fb_error[i] <=median (fb_error) and Sim_error[i] >median (Sim_error) feature points (discarding feature points with poor tracking results), leaving less than 50% of the feature points:

TRACKER.TRACKF2F (IMG1, Img2, points, points2);

(3) Use the remaining less than half of the tracking point inputs to predict the position and size of the bounding box in the current frame TBB:

Bbpredict (Points,points2, Lastbox, TBB);

Calculates the deviation between the X and Y directions of P and P2, obtains the median of deviation, and then gets the size and position of TBB by the formula, which is related to the optical flow method.

(4) Tracking failure detection: If the median value of FB error is greater than 10 pixels (XP), or the position of the current box predicted to move out of the image, it is considered that the tracking error, at this time does not return BoundingBox:

if (TRACKER.GETFB () >10 | | tbb.x>img2.cols | | Tbb.y>img2.rows | | tbb.br (). x < 1 | | tbb.br (). Y <1)

(5) The size of the patch (patch_size= 15*15) corresponding to the normalized img2 (BB), stored in pattern:

Getpattern (Img2 (BB), Pattern,mean,stdev);

(6) Calculating the conservative similarity of the image pattern to the online model m:

Classifier. Nnconf (pattern,isin,dummy,tconf);

(7) If the conservative similarity is greater than the threshold value, the evaluation of this trace is valid, otherwise the trace is invalid:

if (tconf>classifier.thr_nn_valid) Tvalid =true; The implementation principle of 6.1.2 and TLD tracking module and the realization of TRACKF2F function:

The implementation of TLD tracking module is based on the combination of Mediaflow median optical flow tracking and tracking error detection algorithm. The median stream tracking method is based on Forward-backwarderror and NNC. The principle is simple: from the T-moment image of point A, trace to the t+1 moment of the image B-point, and then back, from the t+1 moment of the image of the B-point tracking, if the tracking to the t moment of the image of the C point, so that the forward and backward two trajectories, compare T moment of A and C points distance, if the distance is less than , then it is believed that the forward tracking is correct; this distance is fb_error;

BOOLLKTRACKER::TRACKF2F (const mat& IMG1, const mat& img2,vector<point2f> &points1, VECTOR<CV:: Point2f> &points2)

The function implementation process is as follows:

(1) using the pyramid LK Optical flow method to track forward trajectory:

Calcopticalflowpyrlk (Img1,img2, Points1, points2, status, Similarity,window_size, level, Term_criteria, lambda, 0);

(2) Back tracking, to produce a backward trajectory:

Calcopticalflowpyrlk (IMG2,IMG1, Points2, POINTSFB, fb_status,fb_error,window_size, level, Term_criteria, lambda, 0);

(3) then calculate the Fb-error: the error of the forward and the back trajectory:

for (int i= 0; i<points1.size (); ++i)

Fb_error[i] = Norm (Pointsfb[i]-points1[i]); </

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.