ACL 2016 | Copynet and Pointer SoftmaxOriginal 2016-08-17 small S program Yuan Daily program of the Daily
Previous ACL 2016 series of dry goods, for everyone to recommend a solution to the small problem of MT is a paper "Modeling Coverage for Neural Machine translation." If this paper solves the problem of how to make up the missing text of translation or "ignore" the translated text, then the two papers to be recommended today are to solve some of the information "intact" in the sequence2sequence task (not just machine translation MT). To remain-not translate .
The first two papers to be recommended today are still ACL 2016, respectively:
[1] Jiatao Gu, Zhengdong Lu, Hang Li, et al. "Incorporating copying mechanism in Sequence-to-sequence Learning". ACL 2016.
[2] caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati et al. "Pointing the Unknown Words". ACL 2016.
There are a lot of similarities between these two papers, and of course there are small differences. It is very encouraging to note that although the idea of the two papers is the same, but can be received at the same time ACL 2016, it also shows that the work of both papers is very solid.
First article [1]. In this paper, the proposed network is called Copynet, and the proposed mechanism is called copying mechanism. In fact, for example, we will appear in the dialogue process, "Hello, my name is small S", and then the other side to answer, "very happy to know AH small S." "Then this little s, that is, entity information or date information, etc., is" replicated "in the input-output of the conversation. This information should be kept "intact" and copied from the Sequence2sequence input to the output side. The existing model of End2end sequence2sequence, even if the attention mechanism is added, it is difficult to achieve this. The author [1] thinks the difficulty of this problem is mainly two, one is to judge which/some information in the input (sub-sequences) should be "copied", and the other is to decide where to "paste" the information at the output end. To this end, they proposed a copynet network.
The copynet frame is shown above. The main structure is also based on the attention-based Encoder-decoder framework proposed by Bahdanau 2014. However, in the decoder section, Copynet has made some improvements: (1) because to decide whether to "paste" or select "Generate", there is a probabilistic modeling of Generate-mode and Copy-mode. Due to the addition of Copy-mode, decoder can produce some OOV words (direct "copy" over); (2) may be inspired by neural Turing Machine (NTM), they put decoder part of the hidden state The benefit of this improvement is that the model is better able to pay attention to the position of the input of the information to be "replicated" (3) if the improvement of (2) is regarded as a seletive read, plus attention-based Encoder-decoder itself attentive read, these two mechanisms need better hybrid coordination. As a result, copynet can "replicate" a complete subsequences rather than just copy fragments.
In the experimental section, Copynet is focused on the task of text summarization. The experimental results are very good, case study also gives a lot of analysis, such as it is the perfect solution OOV problem. On the one hand, copynet can keep important information in the "copy" input (the original article) on the other hand, and can generate some abstracts from the original text in the output--and copynet can be regarded as a kind of extractive and abstractive The combination of summarization.
However, in fact [1] of the work of the temporary have a relatively large limitation is that it is "intact" to "copy" The input information, copynet can not be applied to MT this input and output language of different tasks. This is also a significant difference between [1] and [2]. Now let's say [2].
[2] The work is a bit more complicated at first glance, but the flexibility may be relatively high. If the copynet in [1] is to be resolved, what to copy and where to paste, then this paper [2] addresses learn to point and the to point. Both problems are solved by their [2] pointer Softmax (PS) model.
As pictured above, there are two Softmax output layers in PS: Shorlist Softmax is the traditional Softmax, and location Softmax is a more important innovation, which indicates the position of a word at the input end. These two softmax also "correspond to the Copy-mode and Generate-mode in the Copynet[1". That is, when PS decides to walk shortlist Softmax, it opens the Generate-mode, generates a word from shortlist to decoder, and if it is the opposite, gets the bit from location Softmax to "copy" the word at the input end Reset So the way to help PS decide which softmax to take is through a switch network that is actually MLP.
Because it is not directly "copy" The input end of the content, but find the input of the content of the location, so PS can complete the machine translation of the input and output is not the same language task. In the experiment, this paper [2] was also tested on detect rare words, MT and summarization respectively.
The above is today's ACL 2016 series recommendation. viewing: ACL 2016 | Modeling Coverage for neural Machine translation ACL 2016 | Ten excellent papers multimodal Pivot for Image Caption translatio N
ACL 2016 | Ten excellent papers globally normalized transition-based NN
ACL 2016 | Ten excellent papers improving hypernymy detection by YOAV Goldberg ACL 2016 | Ten excellent papers Learning Language Games through Interac tion
ACL 2016 | Ten excellent papers harnessing DNN with Logic Rules ACL 2016 | Abandoning human annotations, presenting a better explanatory evaluation
Dry Goods | The first glimpse of multimodal Deep Learning