Pointer-network's TensorFlow implementation-1

Last Update:2017-06-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Pointer-network is a branch of the recent seq2seq comparison fire, which is widely used in the system of reading comprehension based on deep learning.

interested can read the original paper recommended reading

https://medium.com/@devnag/pointer-networks-in-tensorflow-with-sample-code-14645063f264

This idea is also relatively simple is that the decoded predictions are confined to the input position . it works in a lot of places .

For example, consider a large dictionary of machine translation problems, too many words are long tail, word vector training is not sufficient, then Seq2seq translation is difficult to translate these words in addition to the proper names of what many are can Copy to decoded output.

also consider the text abstract, many times is to copy the words in the original text, especially the long-tail proper names better way is copy rather than generate

There are some pointer-network implementations on the network, more recommended

? https://github.com/ikostrikov/TensorFlow-Pointer-Networks

This is a good example of getting started, using a simple static rnn for better understanding, and of course dynamic faster, but from a learning perspective

It is better to implement static first.

Pointer network implementation of Dynamic RNN

Https://github.com/devsisters/pointer-network-tensorflow ?

This makes a copy of the static RNN implementation and makes a minor change, correcting some of these problems See https://github.com/chenghuige/hasky/tree/master/applications/pointer-network/static

The application of this applet is to enter a sequence For example, output sort results

Our construction data

Python dataset.py

Encoderinputs: [Array ([[[0.74840968]]), Array ([[[[]]]), Array ([[[[0.67414996]]), Array ([[[0.9014052]]), Array ([[ 0.72811645]])

Decoderinputs: [[[[0]], array ([[[[0.67414996]]), Array ([[[[0.70166106]]), Array ([[0.72811645]]), Array ([[ 0.74840968]]), Array ([[0.9014052]])

Targetlabels: [Array ([[[3]]), Array ([[[[2]]), Array ([[[[[5]]), Array ([[[4]]), Array ([[[+]]], [[[0]])

The eval display during the training process :

2017-06-07 22:35:52 0:28:19 eval_step:111300 eval_metrics:

[' eval_loss:0.070 ', ' correct_predict_ratio:0.844 ']

label--: [2 6 1 4 9 7 10 8 5 3 0]

Predict: [2 6 1 4 9 7 10 8 5 3 0]

label--: [1 6 2 5 8 3 9 4 10 7 0]

Predict: [1 6 2 5 3 3 9 4 10 7 0]

that's probably it . The first one we think is the prediction is exactly right, The second prediction is not exactly correct .

The main problem with the original program is that Feed_prev is set to True when the original code is problematic because INP uses decoder_input which is incorrect because

When the prediction is actually no decoder_input input, the original code forecast when decoder input forced copy/feed encoder_input

This is a problem in logic. The experimental results also show that modifying into training also uses encoder_input to generate INP effects much better.

So about Feed_prev we know that at the time of the prediction it must be set to true because, when the predictions are not decoder_input our next output depends on

The output of the previous forecast.

Do we use Decoder_input sequence training (feed_prev==false) or do we use the results from our own predictions to predict the next feed_prev==true?

Refer to TensorFlow official website for instructions

In the above invocation, we set?feed_previous? to False. This means the decoder would use? decoder_inputstensors as provided. If we set? feed_previousto True, the decoder would only use the first element of? decoder_inputs . All other tensors from this list would is ignored, and instead the previous output of the decoder would be used. This is used-decoding translations in our translation model, but it can also be used during training, t omake the model more robust to their own mistakes, similar to? Bengio et al., 2015? (pdf).

From

Used here

train.sh and train-no-feed-prev.sh did a comparative experiment.

Use feed_prev==true effect slightly better when training ( red ) especially the stability variance is smaller

Pointer-network's TensorFlow implementation-1

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Pointer-network's TensorFlow implementation-1

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Pointer-network's TensorFlow implementation-1

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support