Series Catalog:
Seq2seq chatbot chat Robot: A demo build based on Torch Codex
Deep Learning (bot direction) learning notes (1) Sequence2sequence Learning
Deep Learning (bot direction) learning Notes (2) RNN Encoder-decoder and LSTM study 1 preface
This deep learning, in fact, is from the weekly paper notes of the finishing version, that is, the main content of the article is actually my collation of an article, subject to personal level, there may be many places will appear to understand the deviation, or understanding is not in place, so if found what is not the place to welcome the exchange. I did not want to send this blog post, because I feel that the level is really limited, but now it seems good or bad, dragged out to see the light.
So the main content of this series of posts is my weekly paper reading notes, occasionally interspersed with some experimental content. The specific information of the article I will put at the end, the need for students please according to the article title to find the original paper 2 seq2seq is what
There's a lot of explaining about Seq2seq on the internet, but from my point of view, I'd like to think of Seq2seq as:
do some work from a sequence mapping to another sequence task
The actual application, such as the next task can be regarded as a seq2seq task "
1, SMT translation task (source language statement, the target language statement)
2. Dialog task (context statement, reply statement)
As shown above, this is actually an example (from ABC this sequence mapping to WXYZ) 3 RNN encoder-decoder Framework
Generally speaking, in deep learning, dealing with this type of problem, there is a classic framework called Encoder-decoder:
The so-called encoder, is to encode the input sequence into a fixed-length vector
The so-called decoder generates a token sequence based on the background vector (that is, the encoder last output vector) as the output sequence
The lengths of the two-part sequence can be inconsistent.
The most classic Encoder-decoder framework, with the RNN, namely encoder is a RNN, and decoder is also a rnn.
In this case, the maximum likelihood is estimated, so that the input SEQ passes encoder to get the intermediate vector v, then the probability of mapping to decoder is the largest.
This encoder-decoder framework is very classic, the inside with the RNN (LSTM) is also very classic, originally wanted to tell the details, but I am not enough, so probably can only talk about here.
A detailed explanation of Encoder-decoder can be read here: http://blog.csdn.net/malefactor/article/details/51124732
And this article Sequence2sequence model is this rnn encoder-decoder, about his formula, in fact, are basic RNN formula, the article is not much, there is need to read the original text directly. Experiment
Experimental part, which uses the evaluation task provided by WMT14, the task request is English to French translation task
1. The training data set has 12M sentences, including 348M French words, and 304 English words
2. The author chooses the most commonly used 160,000 English words and 80,000 Dictionary of French words, and the words outside the first term are all expressed as UNK
3. The training objectives are as follows, S is the original language sequence, and T is the target language sequence, which maximizes the probability of the following equation. And the post-training translation process is as shown in the second equation.
The standard of evaluation is a result called Bleu, because I do not care about translation tasks, so I know the higher the better the good reason.
And the conclusion is that this model can be very close to WMT14 's highest score (36.5 Notoginseng) discussion
The Sequence2sequnce model presented in this article is very simple, but with the help of the magic of deep learning, its effect is outstanding.
However, this model is because it is too simple, the first use of RNN in the long sentence task poor performance, mainly in the RNN in the input several times so that the initial loss of information, and as the improvement of the use of lstm, it can be a good solution to the problem.
At the same time, there is the use of attention mechanism, to improve the performance of the long sentence, about this mechanism, I will be in the following notes to introduce.
In fact, one of the more fun is that they found this seq2seq model, and if we input the SEQ in reverse order, its performance is better (of course, the author does not know why)
In short, this is a simple but very magical model, then the work is mainly based on this framework, so also as the first article of the note out. Finish
For the first time, the writing is rough, because tomorrow will also have a regular meeting, estimated to be back again after the amendment, now to occupy a pit
@MebiuW Article Information
Article title: Sequence to Sequence learning with neural Networks
Author: Ilya Sutskever,oriol Vinyals,quoc V.le, etc.
Source: Advances in Neural information processing Systems
Http://papers.nips.cc/paper/5346-information-based-learning-by-agents-in-unbounded-state-spaces.pdf