Next talk about PN Learning

Last Update:2018-12-03 Source: Internet

Author: User

Tags tld

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I have translated an article on PN learning before: NLP. Pn learning is an important module, especially in TLD tracking algorithms. If you cannot fully understand this part, it is difficult to fully master the essence of the TLD algorithm. Therefore, based on the previous translation, I will introduce the principle of PN learning based on the specific application of the TLD algorithm.

Pn learning is pn learning. P refers to positive constraint, also known as p-expert or growing event. N refers to negative Constraint, also known as N-expert or pruning event.

P-expert is used to discover the new appearance (deformation) of the target and increase the number of positive samples to make the detection module more robust;

N-expert is used to generate negative training samples. The premise of N-expert is that the (tracked) Foreground target may only appear in a position in the video frame. Therefore, if the foreground target is located, there must be a negative sample.

The Pn learning function in the TLD module is to gradually improve the performance of the detection module (the detection in the TLD) through online processing of video sequences. For each frame in the video, we hope to evaluate the incorrect detection module in the current frame and update the target model, in this way, similar errors can be avoided during later video frame processing. The key to PN learning lies in two types of "experts": p-experts checks data that are incorrectly classified as positive samples (foreground targets) by the tested modules; n-experts checks which detected module errors are classified as negative sample (background) data. It should be noted that both p-experts and N-experts may produce certain deviations. So, if we use these biased data to update the detection module (Target Model), will the performance of the detection model deteriorate? After research, the author found that, despite the error, the error is allowed under certain conditions, and the performance of the detection module will be improved accordingly.

Pn learning consists of four parts: (1) one classifier to be learned; (2) training sample set-samples of some known class tags; (3) supervised Learning-a method for training classifier from a training sample set; (4) P-N experts-used to produce positive (training) samples and negative (training) in the learning process) the relationship between the four parts is shown in:

First, we train some samples with existing class tags using supervised learning methods to obtain an initial classifier. Then, through iterative learning, we use the classifier obtained in the previous iteration to classify all the untagged sample data, while the P-N experts is to find the samples of the error classification, the training sample set is modified accordingly to improve the performance of the classifier obtained after the next iterative training. P-Experts marks those samples marked as negative samples by classifier, but according to the structural constraints, it should assign positive labels to those samples of positive samples and add them to the training sample set; n-Experts marks those samples as positive samples by classifier, but should assign negative tags to those samples according to structural constraints, and add them to the training sample set; this means that P-experts increases the robustness of the classifier, while N-experts increases the discriminative ability of the classifier.

The following is an example of the PN Learning Mechanism: Assume that three consecutive Video Frames exist, as shown below. Each video frame has several scan windows, as shown in (;

Each scan window represents an image patch, and the category tag of the image is represented by the colored dots in (B) (c. The detection module assigns a category value to each image piece independently of each other. Therefore, N scan windows have a combination of category tags. (B) shows one of the possible category tags, which indicate that the target to be detected may appear in multiple regions in a video frame at the same time, the motion of the target to be detected between adjacent video frames is not continuous (for example, the red dot in the upper right corner of the top-right corner of the image in (B) does not appear in the two images). Obviously, this category label format is incorrect. In contrast, the category tag format shown in (c) is displayed. In each video frame, the target may only appear in one region, the target area detected between adjacent video frames is continuous, forming a Target Motion Track. This kind of nature is called "structural. The key to PN learning is to find such structured data to identify the error tags generated by the detection module;

The preceding example shows that P-experts looks for the structural characteristics of the time domain in the video sequence and assumes that the target is moving along the track line, that is, the movement between adjacent frames is small, and there is a certain degree of relevance. P-experts records the position of the target in the previous frame, and predicts the position of the target in the current frame based on the tracking algorithm between frames (lk optical flow method is used here. If the detection module marks the position of the target predicted by the tracking algorithm in the current frame as a negative tag, p-experts generates a positive training sample; n-experts looks for structural features in the space field of the video sequence, and assumes that the target may only appear in one position in a video frame. N-experts analyzes all the output results of the detection module in the current frame and the output results of the tracking module, and finds the region with the greatest possibility. If there is no overlap between a region and the maximum possibility region among all regions where the target may appear in the current frame, it is considered as a negative sample. In addition, the region with the greatest possibility is used to reinitialize the tracking module;

Next, we will give an example to illustrate the situation:

Three consecutive video frames are shown. pn learning needs to process the car in the area of the yellow frame at t time. The tracking module positions the vehicle between adjacent frames. From the previous analysis, we know that the area provided by the tracking module can be used by P-experts to generate a positive training sample. However, due to occlusion, at T + 2, P-experts produces incorrect positive (training) samples. At the same time, n-experts identifies the most likely location of the target (marked with a red asterisk) and marks all other areas as negative training samples. Here, n-experts corrected the p-experts error at T + 2;

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More