Pointer-network's TensorFlow implementation-1

Source: Internet
Author: User

Pointer-network is a branch of the recent seq2seq comparison fire, which is widely used in the system of reading comprehension based on deep learning.

interested can read the original paper recommended reading

https://medium.com/@devnag/pointer-networks-in-tensorflow-with-sample-code-14645063f264

??

This idea is also relatively simple is that the decoded predictions are confined to the input position . it works in a lot of places .

For example, consider a large dictionary of machine translation problems, too many words are long tail, word vector training is not sufficient, then Seq2seq translation is difficult to translate these words in addition to the proper names of what many are can Copy to decoded output.

also consider the text abstract, many times is to copy the words in the original text, especially the long-tail proper names better way is copy rather than generate

??

There are some pointer-network implementations on the network, more recommended

? https://github.com/ikostrikov/TensorFlow-Pointer-Networks

This is a good example of getting started, using a simple static rnn for better understanding, and of course dynamic faster, but from a learning perspective

It is better to implement static first.

Pointer network implementation of Dynamic RNN

Https://github.com/devsisters/pointer-network-tensorflow ?

This makes a copy of the static RNN implementation and makes a minor change, correcting some of these problems See https://github.com/chenghuige/hasky/tree/master/applications/pointer-network/static

??

The application of this applet is to enter a sequence For example, output sort results

??

Our construction data

Python dataset.py

Encoderinputs: [Array ([[[0.74840968]]), Array ([[[[]]]), Array ([[[[0.67414996]]), Array ([[[0.9014052]]), Array ([[ 0.72811645]])

Decoderinputs: [[[[0]], array ([[[[0.67414996]]), Array ([[[[0.70166106]]), Array ([[0.72811645]]), Array ([[ 0.74840968]]), Array ([[0.9014052]])

Targetlabels: [Array ([[[3]]), Array ([[[[2]]), Array ([[[[[5]]), Array ([[[4]]), Array ([[[+]]], [[[0]])

??

The eval display during the training process :

2017-06-07 22:35:52 0:28:19 eval_step:111300 eval_metrics:

[' eval_loss:0.070 ', ' correct_predict_ratio:0.844 ']

label--: [2 6 1 4 9 7 10 8 5 3 0]

Predict: [2 6 1 4 9 7 10 8 5 3 0]

label--: [1 6 2 5 8 3 9 4 10 7 0]

Predict: [1 6 2 5 3 3 9 4 10 7 0]

??

that's probably it . The first one we think is the prediction is exactly right, The second prediction is not exactly correct .

??

The main problem with the original program is that Feed_prev is set to True when the original code is problematic because INP uses decoder_input which is incorrect because

When the prediction is actually no decoder_input input, the original code forecast when decoder input forced copy/feed encoder_input

This is a problem in logic. The experimental results also show that modifying into training also uses encoder_input to generate INP effects much better.

??

So about Feed_prev we know that at the time of the prediction it must be set to true because, when the predictions are not decoder_input our next output depends on

The output of the previous forecast.

Do we use Decoder_input sequence training (feed_prev==false) or do we use the results from our own predictions to predict the next feed_prev==true?

Refer to TensorFlow official website for instructions

In the above invocation, we set?feed_previous? to False. This means the decoder would use? decoder_inputstensors as provided. If we set? feed_previousto True, the decoder would only use the first element of? decoder_inputs . All other tensors from this list would is ignored, and instead the previous output of the decoder would be used. This is used-decoding translations in our translation model, but it can also be used during training, t omake the model more robust to their own mistakes, similar to? Bengio et al., 2015? (pdf).

??

From

??

Used here

train.sh and train-no-feed-prev.sh did a comparative experiment.

Use feed_prev==true effect slightly better when training ( red ) especially the stability variance is smaller

??

??

Pointer-network's TensorFlow implementation-1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.