Yoshua Bengio Latest speech: Attention makes deep learning a great success (46PPT)
Yoshua Bengio, computer scientist, graduated from McGill University, has been a postdoctoral researcher at MIT and T Bell Labs, and has taught at the University of Montreal since 1993, with Yann LeCun, Geoffrey Hinton and known as the "deep learning trio is also one of the three main promoters of neural network revival, making significant contributions to pre-training issues, structural problems and generative models for automatic encoders such as automatic encoder noise reduction. His earlier paper on the language probability model pioneered the neural network as a language model, inspiring a series of articles on NLP that have a major impact in industry. In addition, his team developed the Theano platform.
Below is Yoshua Bengio May 11, 2016 in the Twitter Boston speech ppt Transcript, by the new smart meta-finishing translation, if PPT does not enjoy, you can also copy the link to watch the video directly: https://www.periscope.tv/ Hugo_larochelle/1myxndlqkppgw
In the new smart Yuan public number back "0516" can download all 46 ppt.
The original title: Deep learning in the meaning of natural language
Three key elements from ML to AI:
1. Many & lots of data
2. A very flexible model
3. Strong prior knowledge, can break the "dimension of the Curse"
Break through the curse of the dimension
We need to create a combination of words in a machine learning model
Just as human language will analyze combinatorial words, giving expression and meaning to the concept of combinatorial words
Gain exponential growth in the ability to refer to combined words
Distributed representation/Embedding: Feature Learning
Depth architecture: Multi-layered feature learning
Prior knowledge (Prior): composition is useful in effectively describing the world in which we are located
Advances in the theory of deep learning
Exponential advantage of distributed representation
Exponential advantage of depth
MYTH: Local minimum value of non-convex ∉
Exponential advantage of distributed representation
Compared to the nearest neighbor method or taxonomy model, learning a series of non-mutually exclusive characteristics, more effective in the data.
Related recommended papers
Exponential advantage of depth
Myth is breaking down: local minimums in neural networks
Convexity is not a must
Recommended papers
Saddle Point
Local minimums control the low dimensions, but saddle points take control of the high dimensions
Most local minimums are close to the bottom (global minimum error)
Why N-gram is very poor in generalization.
Neural language model
Next challenge: Rich semantic representation in word order
Capturing the impressive progress of meaning
Easier to learn: non-parametric (check table)
The problem of drawing sequences for richer and more complete reference optimization
Good test case: Machine translation for auto-coded frames
The focus (Attention) mechanism in deep learning
Consider an input (or intermediate) sequence or image
Consider a high-level allegation by setting weights or probabilities for each input position, as in the case of MLP, to apply to each position.
Application of focus mechanism in translation, speech, image, video and storage
End-to-end machine translation
Traditional machine translation: by maximizing the similarity, several models are trained independently, and a logistic regression is obtained on the top and bottom of the N-type graph.
The neuro-linguistic model has been shown to be superior to the N-type graph model in the generalized ability.
Why not train a neural translation model, end-to-end evaluation p (target sentence | source sentence)
2014: Neural machine translation for a breakthrough year
Main papers
The early work
Encoding-decoding framework
Meaning of the middle means = universal representation
Coding: From the arrangement of words to the sentence representation
Decoding: Distribution from the representation to the word order
Bidirectional rnn on the input side
Imitating Alex Graves's work on the handwriting
Focus: Related papers and old papers
Soft focus vs random hard focus
Focus-based neural machine translation
Predictive alignment
Different alignment of French and German
Promotion on Pure AE model
Rnnenc: Encode the entire sentence
Rnnsearch: Forecast Floor Plan
BLEU points in all test sets (including UNK)
End-to-end machine translation under periodic network and focus mechanism
Starting from scratch, the status quo after one year:
English to German
From image to text: Caption Generation under the focus model
Focus on selecting part of the image, and generating the corresponding description word
Say what you see.
presentation, participation and storytelling: using visual focus to generate neural image captions
Good recognition.
The identification of bad
Interesting extension
Efficient processing of a large number of words (minimum batch of words = negative examples) using approximate values of importance sampling (Jean AL, ACL ' 2015)
Multilingual NMT: Shared encoders and decoders, in language pairing, the focus mechanism is a condition
NMT of the character hierarchy
Multi-lingual neural machine translation using the shared focus mechanism
1 encoder + 1 decoder per language
A shared focus model, as well as a "representative translation function" for each language encoding and decoding
Multi-lingual neural machine translation using the shared focus mechanism
Character-based models
is almost impossible in a model based on n-type graphs;
However, it is necessary to deal with the problem of open vocabulary, spelling mistakes, transliteration, digital and other end-to-end problems;
It is necessary for a language that does not have a clear distinction between words or a combination of lines (to allow vocabulary to be displayed);
It is necessary to carry on the law of the word (prefix, suffix, connection, etc.);
Obstacles:
For Rnns: longer-term dependencies
Poor capacity and calculation rates
Pre-Test 2 years ago: less sustainable than a vocabulary-based model
Character-based NMT experiment
Character-based NMT experiment
A focus model in memory access
Neural Turing device
Memory Network
Control read and write to memory using a focus mechanism form
The focus mechanism outputs a softmax in memory
In terms of efficiency, Softmax must be sparse (in most cases, 0), for example, you might use a mixed chart format.
Large Memory Network: Sparse memory access for long-term dependencies
A state in an external memory that can be stored for any length of time until it is read or written
forgetting = vanishing Gradient
Memory = Greater state, avoiding the need to forget or disappear
Delay does not mean to be able to further
In the running project: Knowledge extraction
Learn to fill the memory network with the description of the facts from the natural language
Forcing neural networks to understand language
Extract knowledge from the archive and condense it into a format that can be used
Next big problem: unsupervised learning
Most of the recent breakthroughs have been in supervised deep-learning.
The real challenge in unsupervised learning
Potential benefits:
Ability to handle massive amounts of non-tagged data
Answer a new question for the observed variable
Regularization Matrix--migration learning--Domain adaptive
Easier to optimize (local training signal)
The structural output
necessary for the absence of a specific model or for the main analog RL
Conclusion
Deep learning theory has made significant progress in many frontier areas: why is it better to generalize? Why is the local minimum not a problem for people to consider? Probability interpretation of depth unsupervised learning.
The focus mechanism allows the learner model to make a better choice, whether it is soft focus or hard focus.
Deep learning theory has been a great success in machine translation and subtitle generation.
It can be useful in speech recognition and video, especially if we use deep learning theory to capture a variety of markers.
Deep learning theory can be used to solve long-term dependency problems, allowing some states to persist for any length of time.
Recruitment
Full-time journalist, compilation and event operations
Welcome Intern
and artificial Intelligence Translation Agency volunteers
For more information, please go to the public number and click "Recruit"
or send an email to [email protected]
Yoshua Bengio May 11, 2016 at Twitter Boston's speech ppt