Tracking using multilevel quantizations algorithm learning

Source: Internet
Author: User


Article: Tracking using Multilevel quantizations

Zhibin Hong, Chaohui Wang, Xue Mei, Danil Prokhorov, and Dacheng Tao1

Source: ECCV 2014

tracking problem remains a difficult problem in the field of computer vision. It is difficult to track targets in all scenarios using a single level of information. This tracker explores the fusion of three layers of information to achieve target tracking. The information used are: pixel layer, super pixel layer, target frame layer. The main feature of the tracking algorithm is that robust tracking of non-rigid targets can be achieved.

1 reviews

Many tracking algorithms utilize only a single level of image feature information. For example: pixel, hyper-pixel, Target box. Each level of information has its advantages and disadvantages and is difficult to apply to all situations. In this paper, multi-layer target information is fused based on the conditional random airport. The decision to optimize the target location by fusing multiple layers of information into a framework. In the pixel layer, the online random forest provides a soft decision to determine the probability that each pixel belongs to the target. In the middle layer, the pixels are divided into different hyper-pixel regions by taking into account the spatial relationship between the pixels and the similarity of features. Train another random forest classifier based on the hyper-pixel histogram. At the highest level, a regularization item based on the target box is introduced. Finally, a dynamic graph reduction method is used to solve the optimization problem efficiently and to determine the position of the target. The overall algorithm block diagram is as follows:


In the framework of this document, the conditional random node of the airport includes pixels, hyper-pixels, and target boxes.

2 algorithm

The A-level tracker combines multilayer quantization information into a single graph model for efficient and robust target tracking.

2.1 Multi-layer quantization model

The entire model is built on a three-layer façade representation, which is the pixel layer, the hyper-pixel layer, and the target frame. First, the information is extracted at each level, and then the fusion inference is made using a graph model.

First layer: Pixel layer

Pixels are the finest representation of a picture. Suppose that each pixel I is represented by a D-dimensional eigenvector, and there is a label at each pixel position (0 background, 1 foreground). The one-dimensional energy function of the pixel layer is defined as follows:


P (XI; HP) represents the probability that pixel I belongs to Class XI. Here is the output of the random forest with the parameters of HP.

Second layer: hyper-pixel layer

Super-pixel can help to understand the image very well. First, the Slic (simple lineariterative clustering) algorithm is used to obtain the super-pixel collection by clustering pixels. For each hyper-pixel set K, a category label YK (1 or 0) is assigned. As with the pixel layer, train an ORF to take its output as a probability of a target or background for a particular hyper-pixel. The energy function of the hyper-pixel layer is defined as:


Third layer: Target frame layer

at the highest level, we have the most objective frame to delimit the target. Assuming B (z) represents a target box under parameter Z, the energy function FAI (b (z)) indicates the probability of the target appearing in B (z). In other trackers, the target location is obtained only through optimization, and in the other two layers of information.

For the quantization of the target frame layer, the median stream is used to achieve this. The energy function is defined as follows:


With the above three layers, a conditional random field (CRF) is used to fuse different layers of information. Each cell in the different layers represents a node in the diagram. The corresponding unary potential energy functions are then used to connect the nodes.

At the pixel level, each of the two nodes is connected to each other, and the potential energy function is as follows:


For the connection between the pixel layer and the hyper-pixel layer, use the following potential energy functions:


For the connection between the pixel layer and the target frame layer, use the following potential energy functions:


D (x,i) represents the distance of pixel I from the target boundary.

Finally, for a given image I, we use the Gibbs distribution to describe all the random variables in the condition with the airport (CRF). The Gibbs energy function e (z,x,y) is defined as the sum of the potential energy functions described above.


Finally, the above potential energy function E (z,x,y) needs to be minimized to determine the parameter Z of the Tracking box. As a result, the relative X, y minimization E can use an efficient graph cutting algorithm. Finally, an auxiliary function is introduced to obtain the objective function which needs to be optimized at last:


For the optimization of the upper type, the optimal value for x, y optimization is first obtained, and then the optimal value of E is obtained for Z.

2.2 Online features Forest

1 selection of the characteristics

For the pixel layer, the texture features of the cielab,48 dimension that extract the 3-D rgb,3 form a 54-dimensional eigenvector to represent each pixel point.

For the hyper-pixel layer, the 64-dimensional normalized histogram in the HSV space and the 10-dimensional direction invariant local two value pattern are extracted to form a 74-dimensional feature.

2 Selection of classifiers

This selects the output of the random forest as the potential energy function of the pixel layer and the hyper-pixel layer.

3 Training of random forests

For the pixel layer and the hyper-pixel layer random forest training, a key question for the training sample selection.

For the pixel layer: in the first frame, the initial position of the target is obtained, given in the form of the target box, because the target frame also includes the background pixels, so the target is first divided by the Grabcut method, then the target region of the pixel as a positive sample, all the pixels outside the region as a negative sample. Here, in order to prevent the fragmentation of unreliable situations, if the segmented target area is less than 70% of the target box, it is considered that the segmentation is not reliable, the entire target frame of pixels as a positive sample.

For the hyper-pixel layer: the category label for each of the mega-pixel blocks is determined by the category label of most of the pixels inside it.

During the tracking process, the random forest is also required for online training, when the training sample is selected according to the following guidelines:

For the pixel layer:


For the hyper-pixel layer:


At the same time, in the tracking process also added to the occlusion of the judgment: when the target pixel in the target frame in the proportion of less than 0.3, the target is considered to be obscured, the classifier is not updated at this time.

The entire algorithm pseudo-code is as follows:



A bit messy, hehe, and strive to improve!

Tracking using multilevel quantizations algorithm learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.