Preamble
This repository contains the lecture slides and course description for the deep Natural Language processing course offered In Hilary for the University of Oxford.
This is a advanced course on natural language processing. Automatically processing natural language inputs and producing language outputs is a key component of Artificial general I Ntelligence. The ambiguities and noise inherent in human communication render traditional symbolic AI techniques ineffective for repres Enting and analysing language data. Recently statistical techniques based on neural networks has achieved a number of remarkable successes in natural LANGUAG E processing leading to a great deal of commercial and academic interest in the field
This is a applied course focussing on recent advances in analysing and generating speech and text using recurrent neur Al Networks. We introduce the mathematical definitions of the relevant machine learning models and derive their associated optimisation Algorithms. The course covers a range of applications of neural networks in NLP including analysing latent dimensions in text, TRANSCR Ibing speech to text, translating between languages, and answering questions. These topics is organised to three high level themes forming a progression from understanding the use of neural network s for sequential language modelling, to understanding their with as conditional language models for transduction tasks, and Finally to approaches employing these techniques in combination with other mechanisms for advanced applications. Throughout the course the practical implementation of such models on CPUs and GPU hardware is also discussed.
This course was organised by Phil Blunsom and delivered in partnership with the DeepMind Natural Language the Group.
Lecturers
- Phil Blunsom (Oxford University and DeepMind)
- Chris Dyer (Carnegie Mellon University and DeepMind)
- Edward Grefenstette (DeepMind)
- Karl Moritz Hermann (DeepMind)
- Andrew Senior (DeepMind)
- Wang Ling (DeepMind)
- Jeremy Appleyard (NVIDIA)
Tas
- Yannis Assael
- Yishu Miao
- Brendan Shillingford
- Jan Buys
Timetablepracticals
- Group 1-monday, 9:00-11:00 (Weeks 2-8), 60.05 Thom Building
- Group 2-friday, 16:00-18:00 (Weeks 2-8), 379
- Practical 1:word2vec
- Practical 2:text Classification
- Practical 3:recurrent Neural Networks for text classification and language modelling
- Practical 4:open Practical
Lectures
Public lectures is held in lecture Theatre 1 of the Maths Institute, on Tuesdays and Thursdays, 16:00-18:00 (Hilary term Weeks 1,3-8).
Lecture MATERIALS1. Lecture 1a-introduction [Phil Blunsom]
This lecture introduces the course and motivates why it's interesting to study language processing using deep learning TE Chniques.
[Slides]
[VIDEO]
2. Lecture 1b-deep Neural Networks is our Friends [Wang Ling]
This lecture revises basic machine learning concepts, students should know before embarking on the this course.
[Slides]
[VIDEO]
3. Lecture 2a-word level semantics [Ed Grefenstette]
Words is the core meaning bearing units in language. Representing and learning the meanings of words is a fundamental task in NLP and in this lecture the concept of a word EMB Edding is introduced as a practical and scalable solution.
[Slides]
[VIDEO]
Readingembeddings Basics
- Firth, John R. "A Synopsis of Linguistic Theory, 1930-1955." (1957): 1-32.
- Curran, James Richard. "From distributional to semantic similarity." (2004).
- Collobert, Ronan, et al. "Natural language Processing (almost) from scratch." Journal of machine learning 12. (2011): 2493-2537.
- Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
Datasets and visualisation
- Finkelstein, Lev, et al. "Placing Search in Context:the concept revisited." Proceedings of the 10th International Conference on World Wide Web. ACM, 2001.
- Hill, Felix, Roi Reichart, and Anna Korhonen. "Simlex-999:evaluating semantic models with (genuine) similarity estimation." Computational linguistics (2016).
- Maaten, Laurens van der, and Geoffrey Hinton. "Visualizing data using T-sne." Journal of machine learning-9.Nov (2008): 2579-2605.
Blog posts
- Deep learning, NLP, and representations, Christopher Olah.
- Visualizing Top tweeps with T-sne, in Javascript, Andrej karpathy.
Further Reading
- Hermann, Karl Moritz, and Phil Blunsom. "Multilingual models for compositional distributed semantics." ArXiv preprint arxiv:1404.4641 (2014).
- Levy, Omer, and Yoav Goldberg. "Neural word embedding as implicit matrix factorization." Advances in neural information processing systems. 2014.
- Levy, Omer, Yoav Goldberg, and Ido Dagan. "Improving distributional similarity with lessons learned from Word embeddings." Transactions of the Association for Computational Linguistics 3 (2015): 211-225.
- Ling, Wang, et al. "Two/too simple adaptations of the Word2vec for Syntax problems." Hlt-naacl. 2015.
4. Lecture 2b-overview of the practicals [Chris Dyer]
This lecture motivates the practical segment of the course.
[Slides]
[VIDEO]
5. Lecture 3-language Modelling and Rnns Part 1 [Phil Blunsom]
Language modelling is important task of the great practical use in many NLP applications. This lecture introduces language modelling, including traditional n-gram based approaches and more contemporary neural app Roaches. In particular the popular recurrent neural Network (RNN) language model was introduced and its basic training and evaluatio n algorithms described.
[Slides]
[VIDEO]
Readingtextbook
- Deep learning, Chapter 10.
Blogs
- The unreasonable effectiveness of recurrent neural Networks, Andrej karpathy.
- The unreasonable effectiveness of Character-level Language Models, Yoav Goldberg.
- Explaining and illustrating orthogonal initialization for recurrent neural networks, Stephen merity.
6. Lecture 4-language Modelling and Rnns Part 2 [Phil Blunsom]
This lecture continues on from the previous one and considers some of the issues involved in producing an effective Implem Entation of an RNN language model. The vanishing and exploding gradient problem are described and architectural solutions, such as Long short term Memory (LST M), is introduced.
[Slides]
[VIDEO]
Readingtextbook
- Deep learning, Chapter 10.
Vanishing gradients, Lstms etc.
- On the difficulty of training recurrent neural networks. Pascanu et al., ICML 2013.
- Long short-term Memory. Hochreiter and Schmidhuber, Neural computation 1997.
- Learning Phrase Representations using RNN Encoderdecoder for statistical machine translation. Cho et al, EMNLP 2014.
- Blog:understanding LSTM Networks, Christopher Olah.
Dealing with large vocabularies
- A Scalable hierarchical distributed language model. Mnih and Hinton, NIPS 2009.
- A fast and simple algorithm for training neural probabilistic language models. Mnih and Teh, ICML 2012.
- On Using Very Large the Target Vocabulary for neural machine translation. Jean et al., ACL 2015.
- Exploring the Limits of Language Modeling. Jozefowicz et al., ArXiv 2016.
- Efficient Softmax approximation for GPUs. Grave et al., ArXiv 2016.
- Notes on Noise contrastive estimation and negative sampling. Dyer, ArXiv 2014.
- Pragmatic neural Language modelling in machine translation. Baltescu and Blunsom, Naacl 2015
Regularisation and dropout
- A theoretically grounded application of dropout in recurrent neural Networks. Gal and Ghahramani, NIPS 2016.
- Blog:uncertainty in deep learning, Yarin Gal.
Other Stuff
- Recurrent Highway Networks. Zilly et al., ArXiv 2016.
- Capacity and trainability in recurrent neural Networks. Collins et al., ArXiv 2016.
7. Lecture 5-text Classification [Karl Moritz Hermann]
This lecture discusses text classification, beginning with basic classifiers, such as Naive Bayes, and progressing through To Rnns and convolution Networks.
[Slides]
[VIDEO]
Reading
- Recurrent convolutional neural Networks for Text classification. Lai et al. AAAI 2015.
- A convolutional neural Network for modelling sentences, Kalchbrenner et al. ACL 2014.
- Semantic compositionality through recursive Matrix-vector, Socher et al emnlp 2012.
- Blog:understanding convolution neural Networks for NLP, Denny Britz.
- Thesis:distributional representations for compositional semantics, Hermann (2014).
8. Lecture 6-deep NLP on Nvidia GPUs [Jeremy Appleyard]
This lecture introduces graphical processing Units (GPUs) as a alternative to CPUs in executing deep learning algorithms . The strengths and weaknesses of GPUs is discussed as well as the importance of understanding how memory bandwidth and COM Putation impact throughput for Rnns.
[Slides]
[VIDEO]
Reading
- Optimizing performance of recurrent neural Networks on GPUs. Appleyard et al., ArXiv 2016.
- Persistent rnns:stashing recurrent Weights on-chip, Diamos et al., ICML 2016
- Efficient Softmax approximation for GPUs. Grave et al., ArXiv 2016.
9. Lecture 7-conditional Language Models [Chris Dyer]
In this lecture we extend the concept of language modelling to incorporate prior information. By conditioning a RNN language model on an input representation we can generate contextually relevant language. This very general idea can is applied to transduce sequences into new sequences for tasks such as translation and Summaris ation, or images into captions describing their content.
[Slides]
[VIDEO]
Reading
- Recurrent continuous translation Models. Kalchbrenner and Blunsom, EMNLP 2013
- Sequence to Sequence learning with neural Networks. Sutskever et al., NIPS 2014
- Multimodal neural Language Models. Kiros et al., ICML 2014
- Show and Tell:a neural Image Caption Generator. Vinyals et al., CVPR 2015
Lecture 8-generating Language with Attention [Chris Dyer]
This lecture introduces one of the most important and influencial mechanisms employed in deep neural networks:attention. Attention augments recurrent networks with the ability-condition on specific parts of the input and are key to achieving High performance in the tasks such as machine translation and Image captioning.
[Slides]
[VIDEO]
Reading
- Neural machine translation by jointly learning to Align and Translate. Bahdanau et al., ICLR 2015
- Show, attend, and tell:neural Image Caption Generation with Visual Attention. Xu et al., ICML 2015
- Incorporating structural alignment biases to an attentional neural translation model. Cohn et al., Naacl 2016
- Bleu:a Method for Automatic Evaluation of machine translation. Papineni et al, ACL 2002
Lecture 9-speech Recognition (ASR) [Andrew Senior]
Automatic Speech Recognition (ASR) is the task of transducing Raw audio signals of spoken language into text transcription S. covers the history of the ASR models, from Gaussian mixtures to attention augmented Rnns, the basic linguistics o F speech, and the various input and output representations frequently employed.
[Slides]
[VIDEO]
Lecture 10-text to Speech (TTS) [Andrew Senior]
This lecture introduces algorithms for converting written language into spoken language (Text to Speech). TTS is the inverse process to ASR, but there was some important differences in the models applied. Here we review traditional TTS models, and then cover more recent neural approaches such as DeepMind ' s wavenet model.
[Slides]
[VIDEO]
Lecture-(Coming Soon) Question answering [Karl Moritz Hermann]
[Slides]
[VIDEO]
Lecture-(Coming Soon) Memory [Ed Grefenstette]
[Slides]
[VIDEO]
Piazza
We'll be a using Piazza to facilitate class discussion during the course. Rather than emailing questions directly, I encourage you to post your questions on Piazza to being answered by your fellow St Udents, instructors, and lecturers. However do-please-do note, the lecturers for this course is volunteering their time and may does not always be availab Le to give a response.
Find our class page At:https://piazza.com/ox.ac.uk/winter2017/dnlpht2017/home
Assessment
The primary assessment for this course is a take-home assignment issued at the end of the term. This assignment would ask questions drawing on the concepts and models discussed in the course, as well as from selected re Search Publications. The nature of the questions would include analysing mathematical descriptions of models and proposing extensions, Improveme NTS, or evaluations to such models. The assignment also ask students to read specific, the publications and discuss their proposed algorithms in the C Ontext of the course. In answering questions students'll be expected to both present coherent written arguments and use appropriate mathematic Al formulae, and possibly pseudo-code, to illustrate answers.
The practical component of the course would be a assessed in the usual.
Acknowledgements
This course would not has been possible without the support of DeepMind, the University of Oxford Department of computer Science, Nvidia, and the generous donation of GPUs resources from Microsoft Azure.
Mt Tutorial of "MT" Oxford