Deep Learning (bot direction) learning notes (1) Sequence2sequence Learning

Last Update:2018-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Series Catalog:
Seq2seq chatbot chat Robot: A demo build based on Torch Codex
Deep Learning (bot direction) learning notes (1) Sequence2sequence Learning
Deep Learning (bot direction) learning Notes (2) RNN Encoder-decoder and LSTM study 1 preface

This deep learning, in fact, is from the weekly paper notes of the finishing version, that is, the main content of the article is actually my collation of an article, subject to personal level, there may be many places will appear to understand the deviation, or understanding is not in place, so if found what is not the place to welcome the exchange. I did not want to send this blog post, because I feel that the level is really limited, but now it seems good or bad, dragged out to see the light.

So the main content of this series of posts is my weekly paper reading notes, occasionally interspersed with some experimental content. The specific information of the article I will put at the end, the need for students please according to the article title to find the original paper 2 seq2seq is what

There's a lot of explaining about Seq2seq on the internet, but from my point of view, I'd like to think of Seq2seq as:
do some work from a sequence mapping to another sequence task
The actual application, such as the next task can be regarded as a seq2seq task "
1, SMT translation task (source language statement, the target language statement)
2. Dialog task (context statement, reply statement)

As shown above, this is actually an example (from ABC this sequence mapping to WXYZ) 3 RNN encoder-decoder Framework

Generally speaking, in deep learning, dealing with this type of problem, there is a classic framework called Encoder-decoder:
The so-called encoder, is to encode the input sequence into a fixed-length vector
The so-called decoder generates a token sequence based on the background vector (that is, the encoder last output vector) as the output sequence
The lengths of the two-part sequence can be inconsistent.

The most classic Encoder-decoder framework, with the RNN, namely encoder is a RNN, and decoder is also a rnn.
In this case, the maximum likelihood is estimated, so that the input SEQ passes encoder to get the intermediate vector v, then the probability of mapping to decoder is the largest.

This encoder-decoder framework is very classic, the inside with the RNN (LSTM) is also very classic, originally wanted to tell the details, but I am not enough, so probably can only talk about here.

A detailed explanation of Encoder-decoder can be read here: http://blog.csdn.net/malefactor/article/details/51124732

And this article Sequence2sequence model is this rnn encoder-decoder, about his formula, in fact, are basic RNN formula, the article is not much, there is need to read the original text directly. Experiment

Experimental part, which uses the evaluation task provided by WMT14, the task request is English to French translation task
1. The training data set has 12M sentences, including 348M French words, and 304 English words
2. The author chooses the most commonly used 160,000 English words and 80,000 Dictionary of French words, and the words outside the first term are all expressed as UNK
3. The training objectives are as follows, S is the original language sequence, and T is the target language sequence, which maximizes the probability of the following equation. And the post-training translation process is as shown in the second equation.

The standard of evaluation is a result called Bleu, because I do not care about translation tasks, so I know the higher the better the good reason.
And the conclusion is that this model can be very close to WMT14 's highest score (36.5 Notoginseng) discussion

The Sequence2sequnce model presented in this article is very simple, but with the help of the magic of deep learning, its effect is outstanding.

However, this model is because it is too simple, the first use of RNN in the long sentence task poor performance, mainly in the RNN in the input several times so that the initial loss of information, and as the improvement of the use of lstm, it can be a good solution to the problem.
At the same time, there is the use of attention mechanism, to improve the performance of the long sentence, about this mechanism, I will be in the following notes to introduce.

In fact, one of the more fun is that they found this seq2seq model, and if we input the SEQ in reverse order, its performance is better (of course, the author does not know why)

In short, this is a simple but very magical model, then the work is mainly based on this framework, so also as the first article of the note out. Finish

For the first time, the writing is rough, because tomorrow will also have a regular meeting, estimated to be back again after the amendment, now to occupy a pit

@MebiuW Article Information

Article title: Sequence to Sequence learning with neural Networks
Author: Ilya Sutskever,oriol Vinyals,quoc V.le, etc.
Source: Advances in Neural information processing Systems
Http://papers.nips.cc/paper/5346-information-based-learning-by-agents-in-unbounded-state-spaces.pdf

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Deep Learning (bot direction) learning notes (1) Sequence2sequence Learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Deep Learning (bot direction) learning notes (1) Sequence2sequence Learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support