Video Understanding paper and dataset

Video Understanding paper and dataset _deep

Last Update:2018-08-22 Source: Internet

Author: User

Tags abs benchmark

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Turn from: https://github.com/sujiongming/awesome-video-understanding

Awesome Video Understanding

Understanding video:perceiving Dynamic actions could is a huge advance in how software makes to the world. (from MIT Technology Review December 6, 2017)

A List of resources for the video understanding. Most of papers can be searched by scholar.google.com.

This list is updated on December 13th 2017. Video classification Action recognition video Captioning:will is updated temporal action Detection:will be updated Datasets Table of Contents Papers video classification Action recognition video datasets papers video classification Imag E-based methods Zha S, Luisier F, Andrews W, et al. exploiting image-trained CNN architectures for unconstrained video Cla SSIFICATION[J]. Computer Science, 2015. Sánchez J, Perronnin F, Mensink T, et al. Image classification with the Fisher Vector:theory and Practice[j]. International Journal of Computer Vision, 2013, 105:222-245. Cnn-based methods Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural NETW Orks[c]//proceedings of the IEEE conference on Computer Vision and pattern recognition. 2014:1725-1732. Tran D, Bourdev L D, Fergus R, et al. c3d:generic features for video ANALYSIS[J]. CoRR, abs/1412.0767, 2014, 2 (7): 8. Fernando B, Gould S. LearningEnd-to-end Video classification with rank-pooling[c]//international conference on Machine Learning. 2016:1187-1196. Rnn-based methods Wu Z, Wang X, Jiang Y G, et al. modeling spatial-temporal clues in a hybrid deep the framework for Video classification[c]//proceedings of the 23rd ACM International Conference on Multimedia. ACM, 2015:461-470. Yue-hei Ng J, Hausknecht M, Vijayanarasimhan S, et al. Beyond Short snippets:deep networks for video classification[c]//p Roceedings of the IEEE Conference on Computer vision and pattern recognition. 2015:4694-4702. Action recognition cnn-based Methods Ji S, Xu W, Yang M., et al. 3D convolutional neural Networks for Human Action recognit ION[J]. IEEE Transactions on pattern analysis and Machine Intelligence, 2012, 35 (1): 221-231. Tran D, Bourdev L D, Fergus R, et al. c3d:generic features for video ANALYSIS[J]. CoRR, abs/1412.0767, 2014, 2 (7): 8. Varol G, Laptev I, Schmid C. Long-term temporal convolutions for action recognition[j]. ArXiv PrePrint arxiv:1604.04494, 2016. Sun L, Jia K, Yeung D Y, et al Human action recognition using factorized spatio-temporal convolutional Edings of the IEEE International Conference on Computer Vision. 2015:4597-4605. Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[c]//advances in neural informa tion processing systems. 2014:568-576. Ye H, Wu Z, Zhao R W, et al evaluating Two-stream CNN for video classification[c]//proceedings of the 5th ACM on Internat ional Conference on multimedia retrieval. ACM, 2015:435-442. Wang L, Qiao Y, Tang x. Action recognition with trajectory-pooled deep-convolutional descriptors[c]//proceedings of the IE EE Conference on Computer vision and pattern recognition. 2015:4305-4314. Feichtenhofer C, Pinz A, zisserman a. convolutional two-stream network fusion for video action Recognition[c]//proceedings Of the IEEE conference on Computer Vision and pattern recognition. 2016:1933-1941. Wang L, Xiong Y,Wang Z, et al. Temporal segment Networks:towards good practices for deep action Recognition[c]//european Conference on Co Mputer Vision. Springer International Publishing, 2016:20-36. Zhang B, Wang L, Wang Z, et al. Real-time action recognition with enhanced motion vector cnns[c]//proceedings of the IEEE Conference on Computer Vision and pattern recognition. 2016:2718-2726. Wang X, Farhadi A, Gupta a. actions~ transformations[c]//proceedings of the IEEE conference on Computer Vision and Recognition. 2016:2658-2667. Zhu W, Hu J, Sun G, et al. A Key Volume mining deep framework for action Recognition[c]//proceedings to the IEEE conference on Computer Vision and Pa Ttern recognition. 2016:1991-1999. Bilen H, Fernando B, Gavves E, et al. Dynamic Image networks for action recognition[c]//proceedings of the IEEE conference On Computer Vision and pattern recognition. 2016:3034-3042. Fernando B, Anderson P, Hutter M, et al discriminative hierarchical rank pooling for activity recoGnition[c]//proceedings of the IEEE conference on Computer Vision and pattern recognition. 2016:1924-1932. Cherian A, Fernando B, Harandi M, et al. generalized rank pooling for activity recognition[j]. ArXiv preprint arxiv:1704.02112, 2017. Fernando B, Gavves E, Oramas J, et al. Rank pooling for action Recognition[j]. IEEE Transactions on pattern analysis and machine intelligence, 2017, 39 (4): 773-787. Fernando B, Gould S. discriminatively learned hierarchical Rank pooling networks[j]. ArXiv preprint arxiv:1705.10420, 2017. Rnn-based methods Baccouche M, Mamalet F, Wolf C, et al. sequential Deep for Learning action human tional Workshop on Human Behavior understanding. Springer, Berlin, Heidelberg, 2011:29-39. Donahue J, Anne Hendricks L, Guadarrama S, et al. long-term recurrent convolutional for visual networks and DE Scription[c]//proceedings of the IEEE Conference on Computer vision and pattern recognition. 2015:2625-2634. Veeriah V, Zhuang N, Qi G J. Differential recurrent neural Networks for action recognition[c]//proceedings of the IEEE International Conference O N Computer Vision. 2015:4041-4049. Li Q, Qiu Z, Yao T, et al. Action Recognition by learning deep multi-granular video Spatio-temporal Ceedings of the 2016 ACM on International conference on multimedia retrieval. ACM, 2016:159-166. Wu Z, Jiang Y G, Wang X, et al multi-stream multi-class fusion of deep networks for video classification[c]//proceedings Of the 2016 ACM on multimedia conference. ACM, 2016:791-800. Sharma S, Kiros R, Salakhutdinov R. Action recognition using visual Attention[j]. ARXIV preprint arxiv:1511.04119, 2015. Li Z, Gavves E, Jain M, et al videolstm Convolves, attends and flows for action Recognition[j]. ArXiv preprint arxiv:1607.01794, 2016. Unsupervised learning methods Taylor G W, Fergus R, LeCun Y, et al. convolutional Learning of Spatio-temporal features[c]/ /european Conference on Computer vision. Springer, Berlin,Heidelberg, 2010:140-153. Le Q V, Zou W y, Yeung S y, et al. Learning Hierarchical invariant spatio-temporal features for action recognition with in Dependent subspace Analysis[c]//computer Vision and Pattern recognition (CVPR), IEEE conference on. IEEE, 2011:3361-3368. Yan X, Chang H, Shan S, et al modeling video dynamics with deep Dynencoder[c]//european conference on Computer Vision. Springer, Cham, 2014:215-230. Srivastava N, Mansimov E, Salakhudinov R. Unsupervised learning of video representations using lstms[c]//international Con Ference on Machine Learning. 2015:843-852. Pan y, Li y, Yao T, et al. Learning Deep intrinsic video representation by exploring temporal coherence and Graph Structur E[c]//ijcai. 2016:3832-3838. Ballas N, Yao L, Pal C, et al. delving deeper into convolutional networks to learning video REPRESENTATIONS[J]. ARXIV preprint arxiv:1511.06432, 2015. Video datasets HMDB51 Kuehne H, Jhuang H, Garrote E, et al. hmdb:a Large video database for human MOTIon Recognition[c]//computer Vision (ICCV), IEEE International conference on. IEEE, 2011:2556-2563. state-of-the-art:75% Lan Z, Zhu Y, Hauptmann A G. Deep Local video Feature for Action Recognition[j]. ArXiv preprint arxiv:1701.07368, 2017. UCF-101 Soomro K, Zamir A R, Shah M. ucf101:a DataSet of human actions classes from videos in the wild[j]. ArXiv preprint arxiv:1212.0402, 2012. state-of-the-art:95.6% Diba A, Sharma V, Van Gool L. Deep temporal linear encoding]. ArXiv preprint arxiv:1611.06678, 2016. Activitynet Caba Heilbron F, Escorcia V, Ghanem B, et al. activitynet:a large-scale video benchmark for human activity UN Derstanding[c]//proceedings of the IEEE conference on Computer Vision and pattern recognition. 2015:961-970. state-of-the-art:91.3% Wang L, Xiong Y, Lin D, et al untrimmednets for weakly supervised Action recognition and Detectio N[J]. ArXiv preprint arxiv:1703.03329, 2017. sports-1m karpathy A, Toderici G, Shetty S, et al. large-scale video Classification with convolutional neural networks[c]//proceedings of the IEEE conference in Computer Vision and pattern Recognition. 2014:1725-1732. state-of-the-art:67.6% Abu-el-haija S, Kothari N, Lee J, et al. youtube-8m:a Large-scale video classification J]. ArXiv preprint arxiv:1609.08675, 2016. youtube-8m Abu-el-haija S, Kothari N, Lee J, et al. youtube-8m:a large-scale video classification Benchmark[j]. ArXiv preprint arxiv:1609.08675, 2016. state-of-the-art:84.967% Miech A, Laptev I, Sivic J. learnable pooling with context gating to video Classification[j]. ArXiv preprint arxiv:1706.06905, 2017. Kinetics Kay W, Carreira J, Simonyan K, et al. The Kinetics Human Action video Dataset[j]. ArXiv preprint arxiv:1705.06950, 2017. State-of-the-art:? Moments in time Dataset Mathew Monfort, Bolei Zhou, Sarah Adel bargal, Tom Yan, Alex Andonian, Kandan Ramakrishnan, Lisa B Rown, Quanfu Fan, Dan gutfreund, Carl Vondrick, Aude oliva.moments in time Dataset:one million vidEos for Event Understanding. tech the State-of-the-art:?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More