Article Link: http://isee.sysu.edu.cn/~zhwshi/Research/PreprintVersion/Top-push%20Video-based%20Person%20Re-identification.pdf
About the authors
In person Re-id aspect, the domestic do best should be Zhongshan University's Zheng Wei poem teacher, also has a Chinese, British Maryland University Shaogang Gong teacher, two people also cooperated several articles, see Zheng Wei poem Teacher's bio, Originally teacher Zheng had done in 2008 Shaogang Gong and Tao Xiang after Bo. If you are a person re-id, these two people should be sure to follow.
Summary
This is a video-based person Re-id, this aspect of the work is relatively small, the related work will be summarized in this area.
The proposed method is called top-push distance Learning model (TDL), Top-push's idea is to extend the ref [15] (from one dimension to the high dimension), the method can basically be summed up as: further increase the difference between the classes, Narrowing the differences in the class, the method is very simple, the effect is surprisingly good, more than the highest precision than the previous one times, the experiment will be mentioned later.
Motivation
There are several aspects of the cause, of which the authors highlight the following phenomena they observe:
The horizontal axis represents the picture/video, and the vertical is a distance measure, which represents the intra-class distance, which is also the distance within the class, that is, the difference between the picture/video of the individual, the red represents the picture, and the blue represents the video. Here the author only shows the Prid 2011 and ilids-vid Two databases under the 20 Pictures/video, it can be observed that: the difference in the video class is larger than the picture, that is, a person in the two camera video is more difficult to be recognized as the same person, so yes, we want to constrain the intra-class differences, To increase the difference between classes, this idea is very easy to think, is after the author's approach (TDL) before and after the comparison, you can see that the increase in the difference between the classes, narrowing the difference between the purpose is indeed achieved.
There are a number of other causes, listed below:
-Previous video based person Re-id's work did not effectively utilize video information, where the author fused the apparent features (LBP and color histograms) and spatio-temporal information (HOG3D)
Related work
Basically a lot of single-shot methods (that is, based on a single image of the method, later to explain) can be expanded into multi-shot, and then applied to video-based person Re-id, in addition to this part related work, There are three ways to video-based:
S. Karanam, Y. Li, and R. radke. Sparse re-id:block sparsity for person re-identification. In CVPR Workshop, 2015.
[J] D. Simonnet, M. Lewandowski, S. Velastin, J. Orwell,and E. Turkbeyler. Re-identification of pedestrians in crowds using dynamic time warping. In ECCV, 2012.
T. Wang, S. Gong, X. Zhu, and S. Wang. Person reidentification by video ranking. In ECCV. 2014.
Ref [10] has not read, the author's description is (not quite understand):
Srikrishna et al. [Ten] introduced a block sparse model to handle the video-based person Re-id problem by the recovery problem on embedding s Pace.
However, the shortcomings of the last three jobs are pointed out:
However, these works assume all image sequences is synchronized, but it becomes unapplicable due to different actions tak En by different people.
Alignment itself is a difficult and time-consuming thing, the actual application of course is more limited.
There are some similar or similar approaches as suggested by the authors, such as LMNN, as shown in:
It is also well understood that the goal of LMNN is to reduce the difference between positive samples in the vicinity and punish all the negative samples nearby, while the goal of TDL is to reduce the difference between positive samples and punish the most recent negative samples, so TDL has a stronger constraint than LMNN.
RDC is an earlier work of the author, and the differences between RDC and LDA are as follows:
While RDC was limited by the scaleof relative comparison, the proposed TDL can largely reduce the number of relative compar Isons in the context Oftop-push modeling. In addition, compared-LDA [5], Ourmodel replaces the maximum of inter-class distance by theminimization of hinge loss O F Top-push comparison, so thatour model have imposed much more powerful constraint onthe inter-class modeling.
Approach
In front of so many, the method is very simple, the following formula is solved:
, y i is the first i A label for the sample, x i → Represents the first i A sample, D Represents a distance measure, ρ Represents margin, α is the weight.
The first item represents the intra-class difference, that is, the distance between all positive sample pairs, and the second is to widen the inter-class difference, which is much like triplet loss, unlike the second one. min This, it chooses to leave x i → The most recent negative sample.
Optimization algorithm does not see, nor is the focus of understanding, everyone interested can go to see for themselves.
Experiments
Experimental results are significant (compared to the best results on two databases)
Here is a comparison of ECCV2014, [2014,ECCV] person re-identification by Video Ranking (and later a journal version, the effect is better), but even with this comparison, the accuracy is about one times higher.
Comparison of results with related work
The experimental results are still very good.
But here I still want to talk about my view of metric learning, I always get rid of this idea is to fit the subjective attitude of the database, but the irritating is, this effect is very good!
[2016,CVPR] top-push video-based person re-identification