Video Understanding paper and dataset _deep

Source: Internet
Author: User
Tags abs benchmark

Turn from: https://github.com/sujiongming/awesome-video-understanding

Awesome Video Understanding

Understanding video:perceiving Dynamic actions could is a huge advance in how software makes to the world. (from MIT Technology Review December 6, 2017)

A List of resources for the video understanding. Most of papers can be searched by scholar.google.com.

This list is updated on December 13th 2017. Video classification Action recognition video Captioning:will is updated temporal action Detection:will be updated Datasets Table of Contents Papers video classification Action recognition video datasets papers video classification Imag E-based methods Zha S, Luisier F, Andrews W, et al. exploiting image-trained CNN architectures for unconstrained video Cla SSIFICATION[J]. Computer Science, 2015. Sánchez J, Perronnin F, Mensink T, et al. Image classification with the Fisher Vector:theory and Practice[j]. International Journal of Computer Vision, 2013, 105:222-245. Cnn-based methods Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural NETW Orks[c]//proceedings of the IEEE conference on Computer Vision and pattern recognition. 2014:1725-1732. Tran D, Bourdev L D, Fergus R, et al. c3d:generic features for video ANALYSIS[J]. CoRR, abs/1412.0767, 2014, 2 (7): 8. Fernando B, Gould S. LearningEnd-to-end Video classification with rank-pooling[c]//international conference on Machine Learning. 2016:1187-1196. Rnn-based methods Wu Z, Wang X, Jiang Y G, et al. modeling spatial-temporal clues in a hybrid deep the framework for Video classification[c]//proceedings of the 23rd ACM International Conference on Multimedia. ACM, 2015:461-470. Yue-hei Ng J, Hausknecht M, Vijayanarasimhan S, et al. Beyond Short snippets:deep networks for video classification[c]//p Roceedings of the IEEE Conference on Computer vision and pattern recognition. 2015:4694-4702. Action recognition cnn-based Methods Ji S, Xu W, Yang M., et al. 3D convolutional neural Networks for Human Action recognit ION[J]. IEEE Transactions on pattern analysis and Machine Intelligence, 2012, 35 (1): 221-231. Tran D, Bourdev L D, Fergus R, et al. c3d:generic features for video ANALYSIS[J]. CoRR, abs/1412.0767, 2014, 2 (7): 8. Varol G, Laptev I, Schmid C. Long-term temporal convolutions for action recognition[j]. ArXiv PrePrint arxiv:1604.04494, 2016. Sun L, Jia K, Yeung D Y, et al Human action recognition using factorized spatio-temporal convolutional Edings of the IEEE International Conference on Computer Vision. 2015:4597-4605. Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[c]//advances in neural informa tion processing systems. 2014:568-576. Ye H, Wu Z, Zhao R W, et al evaluating Two-stream CNN for video classification[c]//proceedings of the 5th ACM on Internat ional Conference on multimedia retrieval. ACM, 2015:435-442. Wang L, Qiao Y, Tang x. Action recognition with trajectory-pooled deep-convolutional descriptors[c]//proceedings of the IE EE Conference on Computer vision and pattern recognition. 2015:4305-4314. Feichtenhofer C, Pinz A, zisserman a. convolutional two-stream network fusion for video action Recognition[c]//proceedings Of the IEEE conference on Computer Vision and pattern recognition. 2016:1933-1941. Wang L, Xiong Y,Wang Z, et al. Temporal segment Networks:towards good practices for deep action Recognition[c]//european Conference on Co Mputer Vision. Springer International Publishing, 2016:20-36. Zhang B, Wang L, Wang Z, et al. Real-time action recognition with enhanced motion vector cnns[c]//proceedings of the IEEE Conference on Computer Vision and pattern recognition. 2016:2718-2726. Wang X, Farhadi A, Gupta a. actions~ transformations[c]//proceedings of the IEEE conference on Computer Vision and Recognition. 2016:2658-2667. Zhu W, Hu J, Sun G, et al. A Key Volume mining deep framework for action Recognition[c]//proceedings to the IEEE conference on Computer Vision and Pa Ttern recognition. 2016:1991-1999. Bilen H, Fernando B, Gavves E, et al. Dynamic Image networks for action recognition[c]//proceedings of the IEEE conference On Computer Vision and pattern recognition. 2016:3034-3042. Fernando B, Anderson P, Hutter M, et al discriminative hierarchical rank pooling for activity recoGnition[c]//proceedings of the IEEE conference on Computer Vision and pattern recognition. 2016:1924-1932. Cherian A, Fernando B, Harandi M, et al. generalized rank pooling for activity recognition[j]. ArXiv preprint arxiv:1704.02112, 2017. Fernando B, Gavves E, Oramas J, et al. Rank pooling for action Recognition[j]. IEEE Transactions on pattern analysis and machine intelligence, 2017, 39 (4): 773-787. Fernando B, Gould S. discriminatively learned hierarchical Rank pooling networks[j]. ArXiv preprint arxiv:1705.10420, 2017. Rnn-based methods Baccouche M, Mamalet F, Wolf C, et al. sequential Deep for Learning action human tional Workshop on Human Behavior understanding. Springer, Berlin, Heidelberg, 2011:29-39. Donahue J, Anne Hendricks L, Guadarrama S, et al. long-term recurrent convolutional for visual networks and DE Scription[c]//proceedings of the IEEE Conference on Computer vision and pattern recognition. 2015:2625-2634. Veeriah V, Zhuang N, Qi G J. Differential recurrent neural Networks for action recognition[c]//proceedings of the IEEE International Conference O N Computer Vision. 2015:4041-4049. Li Q, Qiu Z, Yao T, et al. Action Recognition by learning deep multi-granular video Spatio-temporal Ceedings of the 2016 ACM on International conference on multimedia retrieval. ACM, 2016:159-166. Wu Z, Jiang Y G, Wang X, et al multi-stream multi-class fusion of deep networks for video classification[c]//proceedings Of the 2016 ACM on multimedia conference. ACM, 2016:791-800. Sharma S, Kiros R, Salakhutdinov R. Action recognition using visual Attention[j]. ARXIV preprint arxiv:1511.04119, 2015. Li Z, Gavves E, Jain M, et al videolstm Convolves, attends and flows for action Recognition[j]. ArXiv preprint arxiv:1607.01794, 2016. Unsupervised learning methods Taylor G W, Fergus R, LeCun Y, et al. convolutional Learning of Spatio-temporal features[c]/ /european Conference on Computer vision. Springer, Berlin,Heidelberg, 2010:140-153. Le Q V, Zou W y, Yeung S y, et al. Learning Hierarchical invariant spatio-temporal features for action recognition with in Dependent subspace Analysis[c]//computer Vision and Pattern recognition (CVPR), IEEE conference on. IEEE, 2011:3361-3368. Yan X, Chang H, Shan S, et al modeling video dynamics with deep Dynencoder[c]//european conference on Computer Vision. Springer, Cham, 2014:215-230. Srivastava N, Mansimov E, Salakhudinov R. Unsupervised learning of video representations using lstms[c]//international Con Ference on Machine Learning. 2015:843-852. Pan y, Li y, Yao T, et al. Learning Deep intrinsic video representation by exploring temporal coherence and Graph Structur E[c]//ijcai. 2016:3832-3838. Ballas N, Yao L, Pal C, et al. delving deeper into convolutional networks to learning video REPRESENTATIONS[J]. ARXIV preprint arxiv:1511.06432, 2015. Video datasets HMDB51 Kuehne H, Jhuang H, Garrote E, et al. hmdb:a Large video database for human MOTIon Recognition[c]//computer Vision (ICCV), IEEE International conference on. IEEE, 2011:2556-2563. state-of-the-art:75% Lan Z, Zhu Y, Hauptmann A G. Deep Local video Feature for Action Recognition[j]. ArXiv preprint arxiv:1701.07368, 2017. UCF-101 Soomro K, Zamir A R, Shah M. ucf101:a DataSet of human actions classes from videos in the wild[j]. ArXiv preprint arxiv:1212.0402, 2012. state-of-the-art:95.6% Diba A, Sharma V, Van Gool L. Deep temporal linear encoding]. ArXiv preprint arxiv:1611.06678, 2016. Activitynet Caba Heilbron F, Escorcia V, Ghanem B, et al. activitynet:a large-scale video benchmark for human activity UN Derstanding[c]//proceedings of the IEEE conference on Computer Vision and pattern recognition. 2015:961-970. state-of-the-art:91.3% Wang L, Xiong Y, Lin D, et al untrimmednets for weakly supervised Action recognition and Detectio N[J]. ArXiv preprint arxiv:1703.03329, 2017. sports-1m karpathy A, Toderici G, Shetty S, et al. large-scale video Classification with convolutional neural networks[c]//proceedings of the IEEE conference in Computer Vision and pattern Recognition. 2014:1725-1732. state-of-the-art:67.6% Abu-el-haija S, Kothari N, Lee J, et al. youtube-8m:a Large-scale video classification J]. ArXiv preprint arxiv:1609.08675, 2016. youtube-8m Abu-el-haija S, Kothari N, Lee J, et al. youtube-8m:a large-scale video classification Benchmark[j]. ArXiv preprint arxiv:1609.08675, 2016. state-of-the-art:84.967% Miech A, Laptev I, Sivic J. learnable pooling with context gating to video Classification[j]. ArXiv preprint arxiv:1706.06905, 2017. Kinetics Kay W, Carreira J, Simonyan K, et al. The Kinetics Human Action video Dataset[j]. ArXiv preprint arxiv:1705.06950, 2017. State-of-the-art:? Moments in time Dataset Mathew Monfort, Bolei Zhou, Sarah Adel bargal, Tom Yan, Alex Andonian, Kandan Ramakrishnan, Lisa B Rown, Quanfu Fan, Dan gutfreund, Carl Vondrick, Aude oliva.moments in time Dataset:one million vidEos for Event Understanding. tech the State-of-the-art:?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.