Absrtact: Detection-based adaptive tracking has been extensively researched and has a good prospect. The key idea of these trackers is how to train an online, recognizable classifier that separates an object from its local background. Continuously update the classifier with positive and negative samples extracted from the current frame near the detection target location. However, if the detection is inaccurate, the sample may be extracted less accurately, resulting in visual drift. More recently, tracking-based multi-instance learning (MIL) has somewhat suggested ways to address these problems. It puts the sample into the positive and negative packets, and then selects some features by maximizing the likelihood function using the online promotion method. Finally, the selected features are combined for classification. However, in Mil tracking, a feature is selected through a likelihood function, so that the target extracted from the complex background has less information. Inspired by the method of active learning, in this paper, we propose an active feature selection method, by using the Fisher information criteria to measure the uncertainty of the classifier model, we can choose to carry more information than the Mil tracker features. More specifically, by optimizing Fisher's guidelines, we propose an online feature selection method for lifting. Can produce more robust and efficient real-time tracking performance. Compared to advanced trackers, the experimental evaluations based on challenging sequences demonstrate the efficiency, accuracy, and robustness of the tracker presented in this paper.

Keywords: active learning, information standards, multi-instance learning, visual tracking

1 Introduction

In the field of computer vision, visual tracking is a very active research topic, which has a very important position, especially in the applications of vehicle navigation, traffic monitoring and human-computer interaction [1]. Although many algorithms have been proposed in the field of target tracking in recent decades, it has always been a major problem, that is, the appearance of the target object will have significant changes due to some factors, such as illumination change, posture change, occlusion problem, sudden movement. These factors can lead to problems such as visual drift of the tracked results. Therefore, the key to designing a high-performance tracking system is how to design a robust appearance model so that the above mentioned problems can be handled well.

Some appearance models only represent objects, while some appearance models take into account both objects and local backgrounds. The latter method is better than the former, because it considers the tracking problem as a binary classification problem, separating the target from the background through a differentiated classifier. Considering that these methods are relative to the target detection task, they are often referred to as detection-based tracking. When the classifier is trained, the selection of positive and negative samples can affect the performance of the tracking. Most tracking systems Select the current frame as a positive sample. If the location of the trace is inaccurate, the classifier is updated with an incorrect positive sample, resulting in visual drift over time. To mitigate drift problems, multiple samples near the target location of the trace can be used to train the classifier. Using the traditional method of supervised learning to train the classifier, the problem of fuzzy ambiguity can be produced. [2].

Recently, multi-instance Learning (MIL) was proposed in order to solve the fuzzy ambiguity problem of tracking. The sample is placed in the package and provides only the label for the package. If there is a positive sample in the package, then it is a positive package, and if all the samples in the package are negative, then it is a negative packet. Select a sample near the tracking location to put in the positive package, and select a sample away from the tracking location into the negative package. By optimizing the likelihood function of the package to design the classifier, in order to deal with the changing appearance, an online mil lifting algorithm is proposed to select the distinguishing feature from the feature pool by maximizing the likelihood function of the package. Finally, the selected weak classifier is linearly combined into a strong classifier to separate the object from the background in the next frame. Experiments show that it is better than advanced trackers to handle the problem of visual drift [2].

Despite their successes, the Mil tracker [2] has the following drawbacks. First, the selected features carry less information. In order to make the classifier sufficiently distinguishable, it is necessary to select a large number of features from the feature pool, which makes the calculation inconvenient. Second, the more features you choose, the smaller the difference between these features, which also reduces the performance of the classifier, which leads to drift.

In order to solve the above problems, inspired by the active learning method [3], this paper proposes an active feature selection method to select the feature of carrying more information-active feature selection (AFS). An on-line feature selection method is proposed by optimizing the Fisher information function of the package, rather than the likelihood function of the package. Therefore, the selected feature carries more information than the feature selected by the package's likelihood function in the Mil tracker. As a result, the classifier can be designed with fewer features, and the Mil tracker classifier is more efficient and robust. The experimental results based on the challenge video sequences show that AFS has some advantages in terms of validity, accuracy and robustness.

The remainder of this article is organized as follows. Some of the relevant work is reviewed in the second part. In the third part, we describe our tracking algorithm in detail. Part IV compares our trackers with advanced trackers. Finally, Part V is the conclusion.

2 related work

Visual tracking has been extensively studied, and a good review can be seen in [1]. Based on how to deal with the appearance change of target object and background, the most recent algorithms are divided into two types: Generation method [4]-[12] and identification method [2],[13]-[21]. The build method learns the appearance model of the target object by minimizing the difference between the search area and the target model involved. Black et [4] Express objects by offline learning subspace model. In order to deal with the changing appearance of the target, some online appearance updating models are proposed. Jepson and others [5 proposed] Gaussian mixture model, updated by an online expectation maximization (EM) algorithm. Ho et people [6] and Ross et [7] Use the incremental subspace Update method to adapt to the appearance changes. Adam and others [8] proposed a block-based appearance model to deal with posture changes and partial occlusion. Recently, the sparse expression method was proposed to deal with some occlusion problems in visual tracking [9]. KWON[10] The observation model is decomposed into several basic observational models, covering different types of features and motions to deal with postural changes, illumination and scale changes. Sun et [11] An object appearance model is proposed, which combines the local scale invariant feature and the overall incremental principal component analysis (PCA).

By training a classifier with discriminant ability, the object can be separated from the background, and the method of discrimination can be used as the binary classification problem. AVIDAN[13] trains an offline support vector machine (SVM) and combines it into a light-flow-based tracker. To accommodate changes in the appearance of objects and backgrounds, avidan[14] proposes an online lifting method to train the classifier: some weak classifiers are updated online, and then combined into a strong classifier. Collins et [15] an on-line feature selection scheme is presented, which can evaluate multiple features and combine this method with the Mean-shift tracking system [12] and select the feature with the most distinguishing ability. In [16], the use of relationships between objects and structured environments takes advantage of improved tracking performance. Grabner and others [17] developed an on-line lifting feature selection technology, demonstrating good performance to self-adapt to handle appearance changes. In order to better handle the visual drift, Grabner et al [18] presents an online semi-supervised tracker, which is only in the first frame of the label sample, and does not label the subsequent frame of the sample. Babenko et al [2] An online Mil method is proposed to deal with the ambiguity ambiguity problem of tracking position to reduce the visual drift. Kalal and others [19] The semi-supervised learning method is proposed, and a positive and negative sample is selected by an on-line classifier with structural constraints. Recently, an effective tracking algorithm based on the theory of compression sensing [22] [21] has been proposed, which shows that the low-dimensional features that are randomly extracted from the applied-scale image feature space can preserve the recognition ability and thus promote target tracking.

3 Tracking based on adaptive feature selection

A System Overview

Figure 1 illustrates the basic flow of our tracking system. There are two important parts in our tracking system. One is how to detect the position of the object in the next frame, and the other is how to update the classifier. We describe the position of the T-frame target as. A series of image blocks near the old target location are recorded, S is the search radius, and x represents the image block. Then, for all of us to calculate the classifier response, the classifier is a linear combination of some weak classifiers. Finally, we update the target location with a greedy strategy

(1)

When all target locations are updated, a series of samples are sampled and placed in a positive package, and r is a scalar radius. For negative samples, we randomly select some samples, which is a scalar radius because of a large number of samples. If there are not many changes between the two consecutive frames, negative sample blocks (not from the bounding area around the target) may be advantageous for the classification because they are relevant. However, if the background is severely altered, such negative blocks may have marginal effects on the classification because they are not very relevant. In order to coordinate, we only consider negative samples around the target. We put all the negative samples into a negative packet, and use the online method to maximize the packet Fisher information loss function to follow the new classifier.

B Mil Tracker

We'll start with a brief review of the Mil Tracker [2], which is very relevant to our work. The Mil method was introduced by Dietterich and other people, in order to deal with the drug activity prediction. Suppose we have a series of n packages, each with a NI instance. is the label of the package, which is the label of the instance. The mil definition, if the package is positive, at least one instance label is positive. If the package label is 0, all the corresponding instance labels are 0. The Mil tracker looks for discriminant classifiers and can return conditional probabilities. Since the discriminant classifier is an instance classifier and is related to the conditional probability of the instance, the Noisy-or model is used to estimate the packet probability using the conditional probability of the instance.

(2)

Instance probabilities (3)

is the sigmoid function, which learns the classifier by minimizing the log likelihood loss function of the packet

(4)

In order to deal with the appearance change, the online mil lifting algorithm is proposed to update the classifier. First, maintain the weak classifier, and then select a small number of weak classifiers by maximizing the log likelihood of the packet

Among them, a strong classifier consisting of the former k-1 weak classifiers is a weak classifier pool with M candidate weak classifiers. Similar to the lifting feature selection method in face Detection [24], a weak classifier can be considered a feature selection because each weak classifier corresponds to a feature. Feature selection is very useful for reducing visual drift [15]. In addition, the classifier can run efficiently because the number of selected features is much smaller than in the feature pool.

C AFS Principle

From the logarithmic likelihood function in (4), we can find that the feature selection method in (5) is to select the weak classifier to maximize the conditional probability of the positive package and to minimize the conditional probability of the negative packet. We believe that the selected features carry less information than the characteristics of the optimized Fisher guidelines, as described below. Therefore, in order to ensure sufficient discriminant information, in the Mil, a relatively large number of features (K=50,M=250) are selected, in AFS, K=15,m=50. In addition, if too many features are selected, the discrimination between the target and the background feature is reduced.

Similar to the Mil tracker [2], we define the following form of classifier

Where, is a weight vector, is a weak classifier vector. Each element in H is a decision tree function that returns a binary label (+1 or-1). In order to design the classifier, we need to estimate the corresponding parameter α. The Cramer–rao inequality [25] shows that for arbitrary unbiased estimators of α, they are independently distributed samples, and the covariance of TN should satisfy a non-positive definite matrix where I (α) is the Fisher information Matrix [25], defined as

The Fisher information matrix represents the overall uncertainty of the classification model, which is often used in the active feature method [26]. In [26], for active learning of each query, select the Untagged sample that can reduce Fisher information. To measure the uncertainty of the AFS classification model, we use the Fisher information matrix based on the packet probability sample

Wherein, is the package label, Δim (Δ>0 is a scalar parameter, IM is a unit matrix) is increased so that I (α) is non-singular. Therefore, how to choose Δim does not affect the feature selection step. In (8), and through (2), (3), (6) are expressed as follows:

Note that our information matrix (8) is different from the objective function of the recently developed multi-instance active learning (mial) method [27] and [28], because when the label is known, our goal is to measure the uncertainty of the classification model, although the mial goal is to measure the uncertainty of the categorical model of the untagged sample.

The inverse Fisher information Matrix I (α)-1 is the lower boundary of the covariance matrix that estimates α [25]. As a special example, Det (I (α)-1) is the lower boundary of the product of the covariance of the elements in α. Therefore, Liao and others [29] proposed to select the sample maximization det (I (α)-1), reduce the uncertainty of α. However, since it is difficult to calculate Det (I (α)-1), we reduce the trace of the Matrix I (α) because the upper bound of Det (I (α)-1) is. It is easy to confirm that Det (I (α)-1) ≤. Because I (α) is a positive definite symmetric matrix [25], all eigenvalues are positive [30]. Therefore, there are the following inequalities [30]

In (11), set, because each element is a decision tree function. Please see Appendix A.

Although (11) seems complex, its physical meaning is simple. For the positive package, [31], the positive package in the matrix I (α) trace can be simplified to. In order to minimize the function, we need to maximize and. Similar to the packet likelihood function (4), the first step is to maximize the conditional probability of the positive package. The second step is to reach the maximum value, which can measure the categorical uncertainty of the instance. The negative packet in the matrix I (α) trace consists of two parts: and. The analysis and the positive package are the same. Therefore, the trace of minimizing the matrix can be regarded as the tradeoff between packet probability and categorical uncertainty. Below, we propose AFS to select the characteristic of information by minimizing the trace of the matrix.

D Online AFS Boost

When a weak classifier is selected in order to optimize a particular objective function, we view ascension [32] in a statistical way (each weak classifier corresponds to a characteristic):

Among them, the former k-1 a strong classifier composed of weak classifiers. Φ is a collection of all possible weak classifiers, and for online learning we maintain the feature pool of M candidate weak classifiers. When updating a strong classifier, we first update the weak classifier with the latest sample, and then select K (k) by minimizing the Fisher Information Matrix Order

Target tracking article translation--robust target tracking based on active feature selection