"Paper reading" Sequence to Sequence learning with neural Network

Last Update:2018-08-06 Source: Internet

Author: User

Tags dnn

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sequence to Sequence learning with NN

"Sequence-to-sequence learning based on neural networks" was downloaded from the original Google Scholar.

@author: Ilya sutskever (Google) and so on

first, the total Overview

Dnns has made remarkable achievements in dealing with many difficult problems. This paper mentions the problem of using a 2-layer hidden layer neural network to rank n-n-digit numbers. If there is a good learning strategy, DNN can train a good parameter under the supervision and reverse propagation algorithm, and solve many computational complex problems. Usually, the problem solved by DNN is that the algorithm is easy and computationally difficult. DNN is to solve this problem, the calculation of seemingly difficult to solve the problem through a well-designed multilayer neural network, and according to a certain strategy to easily solve.

However, DNN has an obvious flaw: DNN can only handle input, output vector dimensions are fixed-length cases. For cases where the input and output are variable, the rnn-recurrent neural network is easier to solve.

For a rnn, each cell is usually used with lstm. There is also GRU substitution, GRU accuracy may not be as lstm, but more convenient to calculate, because he is the simplification of lstm.

The model of this paper is similar to the model of Encoder-decoder, the parts of encoder and decoder are made up of two different rnn, and the reason for using different rnn is that more parameters can be trained at very little computational cost.

Specifically, in this sequence to sequence study, the variable-length sequence is first extracted with a RNN feature vector-fixed length, which is taken from the last RNN unit flying a lstm.

After that, enter this vector into another RNN (language model), such as the conditional language model, and use beam search to calculate the most probable sentence and get the output.

The innovation of this article is that the source string is the input of the first rnn, and each of these words is the inverse input. Doing so gets a higher bleu score.

Although the model of this paper does not exceed the score of the best model in the present, its inverse input method provides a new way of thinking .

Second, the model

The model for this article is as follows: This is a translation model of English –> French :

Source string is CBA, get output WXYZ.

DataSet:WMT ' 中文版 to French DataSet.

The dictionary used is English 160,000 words, French 80,000 words. The word vectors have been trained well. The unknown Word uses UNK. The end of the sentence is EOS.

Third, training details

Using a 4-layer lstm unit, the deep lstm performance is better

Each layer 1000 lstm, that is, loop 1000 times (because most sentences are about 30 words, in fact this is a bit wasteful)

Initialization parameters are used to obey the uniform distribution U ( -0.8,0.8) Random initialization

The probability of output layer in decoding phase is a large softmax, which takes up most of the computational resources

The word vector dimension is 1000 dimensions

During the learning process, the use of random gradients decreased, the learning rate was initially 0.7, the iteration was 7.5 times, the first 5 fixed learning rate was 0.7, and the second half of the iterative learning rate was halved once

Using Mini-batch, each batch is a 128 sentence

To avoid gradient extinction and gradient explosions, limit the gradient size. If the two norm of the gradient g | | g| | Greater than 5, G = 5*g/| | g| | The conversion.

In order to solve the above mentioned, lstm Transverse 1000 times is wasteful, but we can make the same sentence length in the same mini-batch almost the same as possible. This is twice times the acceleration effect.

The experiment in this paper uses 8 GPUs, of which 4 are used to process each layer of lstm and the rest of the processing Softmax layer.

Iv. Results of the experiment

On the one hand, the experiment directly compares the model of this paper with other classical models, and compares the different super-parameters to this model.

On the other hand, with the statistical machine translation model, it is often better to score higher than using RNN directly. The result of this is as follows:

In addition, the experiment found that lstm to the long sentence performance better.

The experiment also analyzed the blue scores for different sentence lengths:

The analysis of Bleu in the mean word frequency of different sentences is given:

V. Conclusion

This paper concludes as follows:

Using the LSTM RNN Mt can overcome the traditional statistical-based mt->stm.

The source sentence inversion input is very helpful for model promotion. This does not have a mathematical explanation, but a popular understanding is: The target sentence and the beginning of the source sentence is closer to the short-term, in a translation in the beginning, the goal of the beginning of sentence translation quality improvement, improve the quality of the overall translation.

Vi. Other

There are others who study other mechanisms.

The encoding does not use RNN, but instead uses CNN, so that the encoded vector/matrix changes the word order problem.

Some people are committed to integrating RNN into the traditional STM.

There is a mechanism for attention. This mechanism takes into account that encoder may not completely extract all the information from the source sentence, so the new condition (source sentence information) is linearly combined at each step of the decoding using the encoded vector + generate attention vector. The advantage of this is that in decoding the generation of each word, the network is more interested in the different words in the source sentence, which can improve the quality of the translation.

"Paper reading" Sequence to Sequence learning with neural Network

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More