Turn from: https://github.com/sujiongming/awesome-video-understanding
Awesome Video Understanding
Understanding video:perceiving Dynamic actions could is a huge advance in how software makes to the world. (from MIT Technology Review December 6, 2017)
A List of resources for the video understanding. Most of papers can be searched by scholar.google.com.
This list is updated on December 13th 2017. Video classification Action recognition video Captioning:will is updated temporal action Detection:will be updated Datasets Table of Contents Papers video classification Action recognition video datasets papers video classification Imag E-based methods Zha S, Luisier F, Andrews W, et al. exploiting image-trained CNN architectures for unconstrained video Cla SSIFICATION[J]. Computer Science, 2015. Sánchez J, Perronnin F, Mensink T, et al. Image classification with the Fisher Vector:theory and Practice[j]. International Journal of Computer Vision, 2013, 105:222-245. Cnn-based methods Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural NETW Orks[c]//proceedings of the IEEE conference on Computer Vision and pattern recognition. 2014:1725-1732. Tran D, Bourdev L D, Fergus R, et al. c3d:generic features for video ANALYSIS[J]. CoRR, abs/1412.0767, 2014, 2 (7): 8. Fernando B, Gould S. LearningEnd-to-end Video classification with rank-pooling[c]//international conference on Machine Learning. 2016:1187-1196. Rnn-based methods Wu Z, Wang X, Jiang Y G, et al. modeling spatial-temporal clues in a hybrid deep the framework for Video classification[c]//proceedings of the 23rd ACM International Conference on Multimedia. ACM, 2015:461-470. Yue-hei Ng J, Hausknecht M, Vijayanarasimhan S, et al. Beyond Short snippets:deep networks for video classification[c]//p Roceedings of the IEEE Conference on Computer vision and pattern recognition. 2015:4694-4702. Action recognition cnn-based Methods Ji S, Xu W, Yang M., et al. 3D convolutional neural Networks for Human Action recognit ION[J]. IEEE Transactions on pattern analysis and Machine Intelligence, 2012, 35 (1): 221-231. Tran D, Bourdev L D, Fergus R, et al. c3d:generic features for video ANALYSIS[J]. CoRR, abs/1412.0767, 2014, 2 (7): 8. Varol G, Laptev I, Schmid C. Long-term temporal convolutions for action recognition[j]. ArXiv PrePrint arxiv:1604.04494, 2016. Sun L, Jia K, Yeung D Y, et al Human action recognition using factorized spatio-temporal convolutional Edings of the IEEE International Conference on Computer Vision. 2015:4597-4605. Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[c]//advances in neural informa tion processing systems. 2014:568-576. Ye H, Wu Z, Zhao R W, et al evaluating Two-stream CNN for video classification[c]//proceedings of the 5th ACM on Internat ional Conference on multimedia retrieval. ACM, 2015:435-442. Wang L, Qiao Y, Tang x. Action recognition with trajectory-pooled deep-convolutional descriptors[c]//proceedings of the IE EE Conference on Computer vision and pattern recognition. 2015:4305-4314. Feichtenhofer C, Pinz A, zisserman a. convolutional two-stream network fusion for video action Recognition[c]//proceedings Of the IEEE conference on Computer Vision and pattern recognition. 2016:1933-1941. Wang L, Xiong Y,Wang Z, et al. Temporal segment Networks:towards good practices for deep action Recognition[c]//european Conference on Co Mputer Vision. Springer International Publishing, 2016:20-36. Zhang B, Wang L, Wang Z, et al. Real-time action recognition with enhanced motion vector cnns[c]//proceedings of the IEEE Conference on Computer Vision and pattern recognition. 2016:2718-2726. Wang X, Farhadi A, Gupta a. actions~ transformations[c]//proceedings of the IEEE conference on Computer Vision and Recognition. 2016:2658-2667. Zhu W, Hu J, Sun G, et al. A Key Volume mining deep framework for action Recognition[c]//proceedings to the IEEE conference on Computer Vision and Pa Ttern recognition. 2016:1991-1999. Bilen H, Fernando B, Gavves E, et al. Dynamic Image networks for action recognition[c]//proceedings of the IEEE conference On Computer Vision and pattern recognition. 2016:3034-3042. Fernando B, Anderson P, Hutter M, et al discriminative hierarchical rank pooling for activity recoGnition[c]//proceedings of the IEEE conference on Computer Vision and pattern recognition. 2016:1924-1932. Cherian A, Fernando B, Harandi M, et al. generalized rank pooling for activity recognition[j]. ArXiv preprint arxiv:1704.02112, 2017. Fernando B, Gavves E, Oramas J, et al. Rank pooling for action Recognition[j]. IEEE Transactions on pattern analysis and machine intelligence, 2017, 39 (4): 773-787. Fernando B, Gould S. discriminatively learned hierarchical Rank pooling networks[j]. ArXiv preprint arxiv:1705.10420, 2017. Rnn-based methods Baccouche M, Mamalet F, Wolf C, et al. sequential Deep for Learning action human tional Workshop on Human Behavior understanding. Springer, Berlin, Heidelberg, 2011:29-39. Donahue J, Anne Hendricks L, Guadarrama S, et al. long-term recurrent convolutional for visual networks and DE Scription[c]//proceedings of the IEEE Conference on Computer vision and pattern recognition. 2015:2625-2634. Veeriah V, Zhuang N, Qi G J. Differential recurrent neural Networks for action recognition[c]//proceedings of the IEEE International Conference O N Computer Vision. 2015:4041-4049. Li Q, Qiu Z, Yao T, et al. Action Recognition by learning deep multi-granular video Spatio-temporal Ceedings of the 2016 ACM on International conference on multimedia retrieval. ACM, 2016:159-166. Wu Z, Jiang Y G, Wang X, et al multi-stream multi-class fusion of deep networks for video classification[c]//proceedings Of the 2016 ACM on multimedia conference. ACM, 2016:791-800. Sharma S, Kiros R, Salakhutdinov R. Action recognition using visual Attention[j]. ARXIV preprint arxiv:1511.04119, 2015. Li Z, Gavves E, Jain M, et al videolstm Convolves, attends and flows for action Recognition[j]. ArXiv preprint arxiv:1607.01794, 2016. Unsupervised learning methods Taylor G W, Fergus R, LeCun Y, et al. convolutional Learning of Spatio-temporal features[c]/ /european Conference on Computer vision. Springer, Berlin,Heidelberg, 2010:140-153. Le Q V, Zou W y, Yeung S y, et al. Learning Hierarchical invariant spatio-temporal features for action recognition with in Dependent subspace Analysis[c]//computer Vision and Pattern recognition (CVPR), IEEE conference on. IEEE, 2011:3361-3368. Yan X, Chang H, Shan S, et al modeling video dynamics with deep Dynencoder[c]//european conference on Computer Vision. Springer, Cham, 2014:215-230. Srivastava N, Mansimov E, Salakhudinov R. Unsupervised learning of video representations using lstms[c]//international Con Ference on Machine Learning. 2015:843-852. Pan y, Li y, Yao T, et al. Learning Deep intrinsic video representation by exploring temporal coherence and Graph Structur E[c]//ijcai. 2016:3832-3838. Ballas N, Yao L, Pal C, et al. delving deeper into convolutional networks to learning video REPRESENTATIONS[J]. ARXIV preprint arxiv:1511.06432, 2015. Video datasets HMDB51 Kuehne H, Jhuang H, Garrote E, et al. hmdb:a Large video database for human MOTIon Recognition[c]//computer Vision (ICCV), IEEE International conference on. IEEE, 2011:2556-2563. state-of-the-art:75% Lan Z, Zhu Y, Hauptmann A G. Deep Local video Feature for Action Recognition[j]. ArXiv preprint arxiv:1701.07368, 2017. UCF-101 Soomro K, Zamir A R, Shah M. ucf101:a DataSet of human actions classes from videos in the wild[j]. ArXiv preprint arxiv:1212.0402, 2012. state-of-the-art:95.6% Diba A, Sharma V, Van Gool L. Deep temporal linear encoding]. ArXiv preprint arxiv:1611.06678, 2016. Activitynet Caba Heilbron F, Escorcia V, Ghanem B, et al. activitynet:a large-scale video benchmark for human activity UN Derstanding[c]//proceedings of the IEEE conference on Computer Vision and pattern recognition. 2015:961-970. state-of-the-art:91.3% Wang L, Xiong Y, Lin D, et al untrimmednets for weakly supervised Action recognition and Detectio N[J]. ArXiv preprint arxiv:1703.03329, 2017. sports-1m karpathy A, Toderici G, Shetty S, et al. large-scale video Classification with convolutional neural networks[c]//proceedings of the IEEE conference in Computer Vision and pattern Recognition. 2014:1725-1732. state-of-the-art:67.6% Abu-el-haija S, Kothari N, Lee J, et al. youtube-8m:a Large-scale video classification J]. ArXiv preprint arxiv:1609.08675, 2016. youtube-8m Abu-el-haija S, Kothari N, Lee J, et al. youtube-8m:a large-scale video classification Benchmark[j]. ArXiv preprint arxiv:1609.08675, 2016. state-of-the-art:84.967% Miech A, Laptev I, Sivic J. learnable pooling with context gating to video Classification[j]. ArXiv preprint arxiv:1706.06905, 2017. Kinetics Kay W, Carreira J, Simonyan K, et al. The Kinetics Human Action video Dataset[j]. ArXiv preprint arxiv:1705.06950, 2017. State-of-the-art:? Moments in time Dataset Mathew Monfort, Bolei Zhou, Sarah Adel bargal, Tom Yan, Alex Andonian, Kandan Ramakrishnan, Lisa B Rown, Quanfu Fan, Dan gutfreund, Carl Vondrick, Aude oliva.moments in time Dataset:one million vidEos for Event Understanding. tech the State-of-the-art:?