2016_person re-identification Past, Present and future Liang Zheng, Yi Yang, and Alexander G. Hauptmann This is an article about person Re-i D Summary article. Reprint please attach original address: http://blog.csdn.net/zdh2010xyz/article/details/53741682 Abstract Re-id become more and more important. Early on, it was mainly about hand-crafted algorithms and small-scale evaluation articles. In recent years, large-scale datasets and deep learning system have arisen. The article divides the current Re-id problem into two major categories, image-based and video-based. In each class of discussion, the hand-crafted and deep learning system issues are reviewed. At the same time, two new Re-id tasks close to real applications were discussed: End-to-end Re-id with fast re-id in very large galleries. Article contribution: 1) introduces the history of person Re-id, and its relationship with image classfication and instance retrievial. 2) The hand-crafted systems and large-scale methods in image-based and video-based Re-id Missions are analyzed in detail. 3) depicts the End-to-end Re-id with fast tetrieval in large galleries is the future direction. 4) The last short description of some under-developed but very important problems.
1 Introduction say what is Re-id. The beginning of the Trojan War, did not read ... Anyway is Re-id is very important, has the practice value. Technically speaking, the person Re-id system of the actual video surveillance system can be divided into three modules: person Detection,person tracking, and person retrieval. The first two tasks are independent computer vision tasks, so the main work is still the last module. Thesis arrangement: The article mainly discusses Re-id's vision part. Unlike the previous review literature, this article focuses on the subtask of Re-id (which is now available and possible in the future), without being overly detailed about techniques or architectures. Special emphasis is placed on deep learning Methds, end-to-end Re-id and large scale re-id. Section 1.2 describes the history of Re-id. Section 1.3 describes the relationship between Re-id and classification and retrieval. 2nd, 3 chapters respectively introduced image-based, video-based related literature, each kind is divided into hand-crafted and deeply systems method. The 4th chapter reviews the related technologies of detection, tracking and Re-id, and points out the future research emphases. The 5th chapter introduces the best retrieval Models:large-scale Re-id, which is the direction of future research. The 6th chapter introduces some open issues. The 7th chapter concludes.
As for the relationship with classification and retrieval, person Re-id combines the advantages of both. On the one hand, in the training stage, you can learn from person space ditance metrics or feature embeddings. On the other hand, in the retrieval phase, the effective indexing structures and hashing techniques will help to retrieve the query in large gallery.
2 image-based Person Re-id The main model is to use a single image as query, the model can be described as Closed-word model,g is gallery, contains n images, features can be described as these n images belong to n different identities. Given a probe (query) q, its identity number can be obtained through the following formula:
2.1 hand-crafted Systems from the formula (1), you can see a re-id system consisting of two components, image decription and distance metrics. (1) Pedestrian description Use the most feature is Color,texture features use relatively little. Typically, weighted color histogram (WH), maximally Stable color regions (MSCR), and recurrent high-structured patches (RHSP) are used. WH gives a higher weight to pixels near the axis of symmetry, and for each part gets a color histogram. MSCR mainly deal with stable color regions, extracting features including color, area, centroid and so on. RHSP is a texture feature, recurrent texture patches. In recent years, the characteristics of hand-crafted features are somewhat the same. Zhao et al. extracts the 32-dim LAB color histogram and 128-dim sift features of 10*10 image blocks. At the same time, the adjacency constrained search technique is used to find the most suitable matching block from the gallery image according to the matching method of the horizontal dash block. This way also has many people to study, the Representative has SCNCD, lomo and bow and so on. In addition to extracting low-level color and texture features directly, there is an alternative: attribute-based features, which can be seen as mid-level representations. It can be assured that image translation with attributes is more robust than low-level descriptors. A lot of literature has been done in this area, and the results show that it works well. (2) Distacne Metric Learning in Hand-crafte Re-id systems, a good distance Metric is essential. Cause: High-dimensional visual features typically do not capture the invariant factors under Sample variances. About metric learning methods, there has been a detailed overview of the article. The article divides it into w.r.t supervised learning versus unsupervised learning and global learning versus local learning. In person Re-id, the main is supervised global distance metric learning. Global metric learning, in general, is to make a vector of the same class distance as far as possible closer, do not belong to the same class further apart. The most commonly used is the Markov distance (Mahalanobis distance). In person Re-id, the most famous metric learning method is kissme (the principle is not understood, later to fill). On the basis of Markov distance, a large number of metric learning method emerges. Weinberger proposed large margin nearest neighbor Learning (Lmnn) method, Davis proposed information-theoretic metric learning (itml). Recently, Hirzer proposed relaxing the positivity constraint, with lower computational overhead. Chen in the MA distance, the fusion of bilinear similarity, making cross-patch similarities can be modeled. Wait a minute... In addition to learning distance metrics, there are also concerns about learning discriminative subspaces (do not understand, later in detail). At the same time, some people use other learning tools, such as SVM, boosting. 2.2 deeply-learned Systems has won the ILSVRC 12 game since Krizhevsky, and cnn-based's deep learning model has become popular. Two types of CNN models are widely used: 1) classification model for image classification and object detection. 2) Siamese model, for image pairs or triplets. The bottleneck for using deep learning in Re-id is lack of training data. Since most datasets provide two images per identity, the current cnn-based Re-id method is mainly based on the Siamese model. One drawback of the Siamese model is that it cannot take full advantage of Re-id annotations. In fact, the Siamese model uses only the pairwise (or triplet) labels. Another potential strategy is to use Classification/identification mode, which makes full use of the Re-id labels. In large-scale datasets such as PRW, mars,classification model has achieved excellent performance in without careful training sample selection case. But for model convergence, applying identification loss requires more training instances per ID.
The work mentioned above is learn deep features in a end-to-end way. You can also use the extract low-level features as input, such as sift, color histograms, integrated into the fish Vector. 2.3 Datasets and Evaluation first, the scale of the data set is expanding. Second, bounding boxes began to use pedestrian detectors obtained, such as DPM, ACF and so on. Thirdly, more cameras are used. Evaluation Metrics, mainly cumulative matching characteristics (CMC) curve. But with the input of research, especially the existence of multiole ground truths, mean average precision (MAP) was also proposed. Re-id accuracy over the years, is constantly improving.
3 video-based Person Re-id vedio-based methods focuses on multi-shot matching scenarios and the integration of temporal imformation. 3.1 hand-crafted Systems is mainly color-based descriptors. Similar to image-based Re-id. The main difference is the distance calculation, which involves two sets of bounding box features. Called "Multi-shot" person re-id. These methods are mainly based on multiple shots, building appearance models. Now a new trend is incorporate temporal cues in the model. Wang uses spatial-temporal descriptors to identify pedestrians again. Features include hog3d, and the gait Energy image (GEI) gait power. The Gao uses the periodic pedestrian, divides the gait into several fragments, carries on the recognition. The obvious difference between 3.2 deeply-learned Systems video-based and image-based Re-id is that with multiple images for each matching unit (video Sequenc e), after video pooling, either use Multi-match strategy, or a single-match strategy. In the previous work, the use of Multi-match strategy, but the calculation of a large amount. Pooling-based methods, on the other hand, pools multiple query vectors into a global vector with good extensibility. As a result, the current video-based Re-id will contain pooling step, which can be max/average pooling or obtained from a fully connected layer. Another good practice:injecting temporal information in the final representation. 3.3 Datasets and Evaluation multi-shot re-id datasets include ETH, 3DPES, PRID-2011, Ilids-vid, and Mars. 4 Future:detection,tracking,and Person Re-id 4.1 Previous Works Although now person Re-id is an independent research task, but the article believes that the future will be combined with pedestrian Detection and tracking. In particular, the article considers the End-to-end Re-id system (spotting a query person from raw videos), takes raw videos as input, integrates pedestrian detection and tracking, and then Re-identification. At present, most Re-id work assumes two points: 1) Given the gallery of pedestrian boundary. 2) Border Converse Hand-drawn. This will have a very good detection accuracy. In practice, however, these two assumptions are not tenable. On the one hand, gallery size will change with detector threshold. Low thresholds generate more bounding boxing (larger gallery, higher recall, but lower precision), and vice versa. The accuracy of the re-id detection will be due to different thresholds, but not problems. On the other hand, errors are unavoidable in the use of pedestrian detectors,bounding boxes (misalignment, miss-detection, and false alarms), This will greatly affect the accuracy of re-id detection, which is now rarely considered. The second problem, many datasets, such as CUHK03, Market-1501, and MARS, is similar to the actual scenario. In these data sets, the bounding boxes detected by the detector and the hand-drawn bounding boxes are used, the former is lower than the latter. In the Mars dataset, although tracking errors and detection error are presented, we do not know how Tracknig errors affects re-id accuracy. In the end-to-end person Re-id system, how to pick detectors and tracker will be a challenge. In 2016, Xiao and Zheng almost simultaneously proposed the End-to-end Re-id system based on the large-scale dataset. is the raw video frAme and query bounding box as input. As shown in the following illustration: As can be seen from the figure, better better pedestrian detector will produce higher re-id accuracy given the same Re-id feature. From multiple papers, it can be concluded that good pedestrian detection will help person re-id. However, in these so-called end-to-end systems, no one has studied pedestrian tracking. This work is seen as the ultimate goal of integrating detection, tracking and retrieval as a framework. This study will require large-scale data set support for the bounding box annotations for these three tasks. 4.2 Future issues 1) System performance evaluation an appropriate evaluation methodology for End-to-end Re-id Task exception important, End-to-end Re-id is different from the conventional Re-id problem, which has a dynamic galeries. At the same time, it is not yet known how to evaluate the performance of detection/tracking in person Re-id. The following questions are raised from two aspects: 1. Evaluation metrics for pedestrian detection and tracking in Re-id is very important. Evaluation protocol should be able to quantify and rank detector/tracker performance in a realistin at the same time unbiased manner and informative of Re-id accuracy. Because in the person Re-id task, just to find out this person, not too concerned about the accuracy of the person detection. Therefore, the article believes that Miss rate and average Precison can be used as the pedestrian detection performance evaluation in person Re-id. Another is the calculation of AP/MR, this involves IOU value, the test results show that the IOU threshold value of 0.7 to take 0.5, the detection accuracy is more stable. The suggestion of the article is that larger IoU can guarantee better localization results, but this also has to beAccording to different circumstances. Although there are evaluation on pedestrian detection, tracking largely is now unknown for person Re-id. In previous multiple object tracking (MOT) benchmark, commonly used multiple object tracking precision (MOTP), mostly track (MT) targets, the Tot Al Number of false positives (FP), the total number of ID switches (IDS), the total number of times a trajectory is fragment Ed (Frag), the number of frames processed per second (Hz), and so on, may be affected by the processing speed of some indicators, because the Re-id task in person tracking is off-line step. For Re-id, we envision this tracking precision is critical as it's undesirable to has outlier images in the tracklets wh Ich compromise the effectiveness of pooling. We also speculate that 80% might is not a optimal threshold for evaluating MT under Re-id. In the future data set, once considered Re-id tracking problem, the first task is to design the appropriate metrics to evaluate the different tracker. 2. W.R.T the evaluation procedure concerns the Re-id accuracy of the entire system. Here involves detector threshold problem, too strict, then gallery less, then the target may contain incomplete, too loose, then gallery more, then may have more background to include in. Both of these results are not good for re-id results. There is no effective workaround yet, but remember that the size of this gallery is subject to DEtector threshold control, in the design of the new evaluation metrics to consider this issue. Another point, is how to navigate from a given video to the location of the identy of the query, the task is relatively simple than detection/tracking+reidentification, does not require such a high detection accuracy, as long as the location can be done. In this task, you can set loose IoU and focus more on matching, which is to find a specific person from a whole bunch of bounding box or spatial-temporal tube. 2) The influence of Detector/tracker on Re-id for end-to-end Re-id system, study detection/tracking Methods/data contribution to Re-ID. First: pedestrian/tracking errors does affect re-id accuracy. But there are also studies suggesting that detection/tracking errors can be avoided in earlier stages. For example, in Xiao's proposed network, he added localization loss to the fast r-cnn Sub-model network, which is helpful for the efficient positioning of the Re-id system. Future research can focus on the independence of detection/tracking quality in person Re-id. Since the development of error-free detector and tracker is unrealistic, the article suggests integrating scores detection in re-id matching confidence. Example: How to correct errors by effectively identifying outliers, how to train context models that does not rely solely on detect Ed bounding boxes. Second, more attention needs to be paid to detection and tracking, which, if properly designed, will greatly promote Re-id. Although we can not directly see that pedestrian detection/tracking is helpful for re-id, it is possible to refer to general image classification and fine-grained classification, Can get some clues. If you can better distinguish the different IDsentity, which is helpful for distinguishing pedestrians from backgrounds, is also the opposite. Another point that can be studied is unsupervised tracking data. Pedestrian tracking in video is not that rare, though inevitably there is a mistake. However, face recognition, color, non-background information are helpful to improve the accuracy of tracking. In the process of tracking, pedestrians will have a relatively large change. These sequence diagrams, known as racking results, are used to train pedestrian verification/identification to reduce reliance on large-scale supervised data.
5 Future:person Re-id in very large galeries although the size of the database has been expanding, it is clear that it is far from being used. So, person re-id in very large galleries should is a critical direction in the future.
6 other important yet under-developed open issues 6.1 Battle against data volumn the labeling of datasets in person Re-id is a very rare thing, because not only do you want to label the border, but also The ID must be marked. In the last two years, there have been a number of large datasets, such as Market-1501, PRW, LSPs, and Mars, thanks to the creators of these datasets, but these datasets are still far from practical. The article argues that there are two alternative strategies to improve the problem. First: The use of annotation in tracking and detection needs to be explored in depth. Second: Transfer learning. Transfers a trained model from the source to the target domain. 6.2 re-ranking Re-id Results re-identification can be seen as a retrieval process, re-ranking is very important for improving the accuracy of the search. 7 Conclusion slightly
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.