Yoshua Bengio May 11, 2016 at Twitter Boston's speech ppt

Source: Internet
Author: User
Tags image to text

Yoshua Bengio Latest speech: Attention makes deep learning a great success (46PPT)

Yoshua Bengio, computer scientist, graduated from McGill University, has been a postdoctoral researcher at MIT and T Bell Labs, and has taught at the University of Montreal since 1993, with Yann LeCun, Geoffrey Hinton and known as the "deep learning trio is also one of the three main promoters of neural network revival, making significant contributions to pre-training issues, structural problems and generative models for automatic encoders such as automatic encoder noise reduction. His earlier paper on the language probability model pioneered the neural network as a language model, inspiring a series of articles on NLP that have a major impact in industry. In addition, his team developed the Theano platform.

Below is Yoshua Bengio May 11, 2016 in the Twitter Boston speech ppt Transcript, by the new smart meta-finishing translation, if PPT does not enjoy, you can also copy the link to watch the video directly: https://www.periscope.tv/ Hugo_larochelle/1myxndlqkppgw

In the new smart Yuan public number back "0516" can download all 46 ppt.

The original title: Deep learning in the meaning of natural language

Three key elements from ML to AI:

1. Many & lots of data

2. A very flexible model

3. Strong prior knowledge, can break the "dimension of the Curse"

Break through the curse of the dimension

    • We need to create a combination of words in a machine learning model

      Just as human language will analyze combinatorial words, giving expression and meaning to the concept of combinatorial words

    • Gain exponential growth in the ability to refer to combined words

      Distributed representation/Embedding: Feature Learning

      Depth architecture: Multi-layered feature learning

    • Prior knowledge (Prior): composition is useful in effectively describing the world in which we are located

Advances in the theory of deep learning

    • Exponential advantage of distributed representation

    • Exponential advantage of depth

    • MYTH: Local minimum value of non-convex ∉

Exponential advantage of distributed representation

Compared to the nearest neighbor method or taxonomy model, learning a series of non-mutually exclusive characteristics, more effective in the data.


Related recommended papers


Exponential advantage of depth

Myth is breaking down: local minimums in neural networks

Convexity is not a must

Recommended papers

Saddle Point

    • Local minimums control the low dimensions, but saddle points take control of the high dimensions

    • Most local minimums are close to the bottom (global minimum error)

Why N-gram is very poor in generalization.

Neural language model

Next challenge: Rich semantic representation in word order

    • Capturing the impressive progress of meaning

    • Easier to learn: non-parametric (check table)

    • The problem of drawing sequences for richer and more complete reference optimization

    • Good test case: Machine translation for auto-coded frames



The focus (Attention) mechanism in deep learning

Consider an input (or intermediate) sequence or image

Consider a high-level allegation by setting weights or probabilities for each input position, as in the case of MLP, to apply to each position.



Application of focus mechanism in translation, speech, image, video and storage

End-to-end machine translation

    • Traditional machine translation: by maximizing the similarity, several models are trained independently, and a logistic regression is obtained on the top and bottom of the N-type graph.

    • The neuro-linguistic model has been shown to be superior to the N-type graph model in the generalized ability.

    • Why not train a neural translation model, end-to-end evaluation p (target sentence | source sentence)

2014: Neural machine translation for a breakthrough year

Main papers

The early work

Encoding-decoding framework

    • Meaning of the middle means = universal representation

    • Coding: From the arrangement of words to the sentence representation

    • Decoding: Distribution from the representation to the word order



Bidirectional rnn on the input side

Imitating Alex Graves's work on the handwriting

Focus: Related papers and old papers

Soft focus vs random hard focus

Focus-based neural machine translation

Predictive alignment

Different alignment of French and German

Promotion on Pure AE model

    • Rnnenc: Encode the entire sentence

    • Rnnsearch: Forecast Floor Plan

    • BLEU points in all test sets (including UNK)

End-to-end machine translation under periodic network and focus mechanism

Starting from scratch, the status quo after one year:

English to German

From image to text: Caption Generation under the focus model

Focus on selecting part of the image, and generating the corresponding description word

Say what you see.

presentation, participation and storytelling: using visual focus to generate neural image captions

Good recognition.

The identification of bad

Interesting extension

    • Efficient processing of a large number of words (minimum batch of words = negative examples) using approximate values of importance sampling (Jean AL, ACL ' 2015)

    • Multilingual NMT: Shared encoders and decoders, in language pairing, the focus mechanism is a condition

    • NMT of the character hierarchy

Multi-lingual neural machine translation using the shared focus mechanism

    • 1 encoder + 1 decoder per language

    • A shared focus model, as well as a "representative translation function" for each language encoding and decoding

Multi-lingual neural machine translation using the shared focus mechanism

    • Migration learning plays a role

    • In most cases, it is beneficial to locate parallel corpora.

Character-based models

    • is almost impossible in a model based on n-type graphs;

    • However, it is necessary to deal with the problem of open vocabulary, spelling mistakes, transliteration, digital and other end-to-end problems;

    • It is necessary for a language that does not have a clear distinction between words or a combination of lines (to allow vocabulary to be displayed);

    • It is necessary to carry on the law of the word (prefix, suffix, connection, etc.);

Obstacles:

    • For Rnns: longer-term dependencies

    • Poor capacity and calculation rates

    • Pre-Test 2 years ago: less sustainable than a vocabulary-based model

Character-based NMT experiment

    • 2-tier architecture

    • A higher level of RNN dynamically determines when a GRU-like formula is used to update the state with softness

Character-based NMT experiment

A focus model in memory access

    • Neural Turing device

    • Memory Network

    • Control read and write to memory using a focus mechanism form

    • The focus mechanism outputs a softmax in memory

    • In terms of efficiency, Softmax must be sparse (in most cases, 0), for example, you might use a mixed chart format.

Large Memory Network: Sparse memory access for long-term dependencies

    • A state in an external memory that can be stored for any length of time until it is read or written

    • forgetting = vanishing Gradient

    • Memory = Greater state, avoiding the need to forget or disappear

Delay does not mean to be able to further

In the running project: Knowledge extraction

    • Learn to fill the memory network with the description of the facts from the natural language

    • Forcing neural networks to understand language

    • Extract knowledge from the archive and condense it into a format that can be used

Next big problem: unsupervised learning

Most of the recent breakthroughs have been in supervised deep-learning.

The real challenge in unsupervised learning

Potential benefits:

    • Ability to handle massive amounts of non-tagged data

    • Answer a new question for the observed variable

    • Regularization Matrix--migration learning--Domain adaptive

    • Easier to optimize (local training signal)

    • The structural output

    • necessary for the absence of a specific model or for the main analog RL

Conclusion

Deep learning theory has made significant progress in many frontier areas: why is it better to generalize? Why is the local minimum not a problem for people to consider? Probability interpretation of depth unsupervised learning.

The focus mechanism allows the learner model to make a better choice, whether it is soft focus or hard focus.

Deep learning theory has been a great success in machine translation and subtitle generation.

It can be useful in speech recognition and video, especially if we use deep learning theory to capture a variety of markers.

Deep learning theory can be used to solve long-term dependency problems, allowing some states to persist for any length of time.

Recruitment

Full-time journalist, compilation and event operations

Welcome Intern

and artificial Intelligence Translation Agency volunteers

For more information, please go to the public number and click "Recruit"

or send an email to [email protected]

Yoshua Bengio May 11, 2016 at Twitter Boston's speech ppt

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.