Deep learning notes--a sentence matching method based on bidirectional rnn (LSTM, GRU) and attention model

Source: Internet
Author: User

This paper mainly introduces the sentence matching method based on the bidirectional rnn (LSTM, GRU) and attention model, which is used to match the sentences with Word2vec and Doc2vec, and the method of sentence matching based on the traditional machine learning method.

First look at what is called sentence to match:

Sentence pair matching (sentence Pair Matching) problem is a very common problem in NLP, so-called "sentence pair matching", that is, given two sentences S1 and S2, the task goal is to determine whether the two sentences have some kind of relationship. If the problem is formally defined, it can be understood as follows:

The meaning is that given two sentences, you need to learn a mapping function, input is two sentence pairs, after the mapping function transformation, the output is the task classification tag set of a class of tags.
The typical example is the paraphrase task, which is to determine whether the two sentences are semantically equivalent, so its classification tag set is a two value set of {equivalent, not equivalent}. In addition, there are many other types of tasks that belong to sentence matching, such as question matching and answer Selection in question and answer systems.


Let's take a look at some of the sentences in deep learning that match the model:


sentence-to-match model (i)


Is the two sentence S and T together, in the middle with a special separator EOS segmentation, where EOS does not represent the end of a sentence, but represents the two sentences of the separator, so that the RNN input layer is constructed. You can then set up a bidirectional and deep network structure, above the output of the highest layer of RNN layer, and set up a attention model layer. Here the attention model layer is actually a static am, the specific method is to calculate the attention weight of the blstm each node, and then multiply the individual nodes by the attention weight to add and get a vector representation.

Then, on the attention model, we can also set up a layer of softmax layer, so that we can achieve the final classification purposes.


Sentence-to-match model (II)


For two sentences, each set on a rnn or deep lstm or two-way deep lstm, and so on, each rnn the purpose is to extract the characteristics of the sentence, and then the two sentences extracted from the features into the higher layer of the MLP multilayer neural network input layers, Through the hidden layer of MLP, the nonlinear mapping relation of two sentences is made, and finally the classification results are given by Softmax classification layer.
This makes it possible to make a categorical decision on whether or not the two sentences have a relationship by two rnn, using the training data to obtain the network parameters, and then the neural network can be used to classify the real tasks.


Next, we'll add a attention model layer before the MLP layer, and get the sentence-to-match models three

Sentence-to-match model (III)



The Attention model layer is first given a vector representation of two sentences, and then the resulting vector representation is stitched together as the input of the MLP, which is eventually classified by Softmax.


Sentence-to-match model (IV)


The difference between model four and model three is that the model three is the result of a deep bilstm of each sentence, plus a static am to get a vector expression of the sentence, and then the two sentences of the vector expression stitching together. and model four is through the soft Attention model to get two of the mutual Attention vectors between the sentences, and then the meanpooling of these vectors, the resulting input MLP, and finally through the Softmax to obtain the classification results.


Finally: The bilstm in the figure can also be replaced with Bigru.

This article is mainly to combine RNN and attention model to do some of the sentence to match the summary, concrete implementation and experiment I will slowly put up in the follow-up.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.