Pointer-network is a branch of the recent seq2seq comparison fire, which is widely used in the system of reading comprehension based on deep learning.
interested can read the original paper recommended reading
https://medium.com/@devnag/pointer-networks-in-tensorflow-with-sample-code-14645063f264
??
This idea is also relatively simple is that the decoded predictions are confined to the input position . it works in a lot of places .
For example, consider a large dictionary of machine translation problems, too many words are long tail, word vector training is not sufficient, then Seq2seq translation is difficult to translate these words in addition to the proper names of what many are can Copy to decoded output.
also consider the text abstract, many times is to copy the words in the original text, especially the long-tail proper names better way is copy rather than generate
??
There are some pointer-network implementations on the network, more recommended
? https://github.com/ikostrikov/TensorFlow-Pointer-Networks
This is a good example of getting started, using a simple static rnn for better understanding, and of course dynamic faster, but from a learning perspective
It is better to implement static first.
Pointer network implementation of Dynamic RNN
Https://github.com/devsisters/pointer-network-tensorflow ?
This makes a copy of the static RNN implementation and makes a minor change, correcting some of these problems See https://github.com/chenghuige/hasky/tree/master/applications/pointer-network/static
??
The application of this applet is to enter a sequence For example, output sort results
??
Our construction data
Python dataset.py
Encoderinputs: [Array ([[[0.74840968]]), Array ([[[[]]]), Array ([[[[0.67414996]]), Array ([[[0.9014052]]), Array ([[ 0.72811645]])
Decoderinputs: [[[[0]], array ([[[[0.67414996]]), Array ([[[[0.70166106]]), Array ([[0.72811645]]), Array ([[ 0.74840968]]), Array ([[0.9014052]])
Targetlabels: [Array ([[[3]]), Array ([[[[2]]), Array ([[[[[5]]), Array ([[[4]]), Array ([[[+]]], [[[0]])
??
The eval display during the training process :
2017-06-07 22:35:52 0:28:19 eval_step:111300 eval_metrics:
[' eval_loss:0.070 ', ' correct_predict_ratio:0.844 ']
label--: [2 6 1 4 9 7 10 8 5 3 0]
Predict: [2 6 1 4 9 7 10 8 5 3 0]
label--: [1 6 2 5 8 3 9 4 10 7 0]
Predict: [1 6 2 5 3 3 9 4 10 7 0]
??
that's probably it . The first one we think is the prediction is exactly right, The second prediction is not exactly correct .
??
The main problem with the original program is that Feed_prev is set to True when the original code is problematic because INP uses decoder_input which is incorrect because
When the prediction is actually no decoder_input input, the original code forecast when decoder input forced copy/feed encoder_input
This is a problem in logic. The experimental results also show that modifying into training also uses encoder_input to generate INP effects much better.
??
So about Feed_prev we know that at the time of the prediction it must be set to true because, when the predictions are not decoder_input our next output depends on
The output of the previous forecast.
Do we use Decoder_input sequence training (feed_prev==false) or do we use the results from our own predictions to predict the next feed_prev==true?
Refer to TensorFlow official website for instructions
In the above invocation, we set?feed_previous? to False. This means the decoder would use? decoder_inputstensors as provided. If we set? feed_previousto True, the decoder would only use the first element of? decoder_inputs . All other tensors from this list would is ignored, and instead the previous output of the decoder would be used. This is used-decoding translations in our translation model, but it can also be used during training, t omake the model more robust to their own mistakes, similar to? Bengio et al., 2015? (pdf).
??
From
??
Used here
train.sh and train-no-feed-prev.sh did a comparative experiment.
Use feed_prev==true effect slightly better when training ( red ) especially the stability variance is smaller
??
??
Pointer-network's TensorFlow implementation-1