<convolutional Neural Network for paraphrase identification>

Source: Internet
Author: User

Yin's paper proposes a called Bi-CNN-MI architecture that Bi-CNN represents two CNN models that use frames, and a Siamese MI multi-granularity interaction feature. Bi-CNN-MIconsists of three parts:

    • Sentence analysis Model (CNN-SM)

This part of the model mainly uses the above-mentioned Kal model in 2014, for the sentence itself to extract four kinds of granularity of the characteristics of the expression: words, short ngram, long ngram and sentence granularity. Multi-granularity feature representation is necessary to improve the performance of the model on the one hand and enhance the robustness of the model on the other.

    • Sentence interaction calculation model (CNN-IM)

This part of the model is based on the RAE model proposed by Socher in 2011, and makes some simplification, that is, only 22 comparison of extracting features under the same granularity is made.

    • LR or Softmax network layer to fit tasks
Model structure

The model presented in this paper is mainly based on KAL model and Socher Rae model, such as:

Through the model diagram, we can see the main idea of the model: on the one hand, using KAL model to extract multiple granularity features, on the other hand, taking the idea of Rae model, the 22 similarity of the extracted features is calculated, the result of the calculation is dynamic pooling further extracted by the way of a few features, then the various levels of The pooling calculated results are divided into a set of vectors, which are connected with the LR (or Softmax) layer in an all-connected way, thus adapting the synonym detection task itself.

The detailed calculation details of this model are no longer mentioned, and interested readers can go directly to the paper. In addition to proposing this model structure, one of the highlights of the paper is the use of a language-like model to CNN-LM pre-train the models in the above-mentioned CNN section to determine the parameters of the model in advance. CNN-LMnetwork structure such as:

CNN-LMThe training of the model presupposes the use of the final experimental data set, i.e. MSRP, and the author adds 100,000 additional English sentence corpora due to the small size of the MSRP data. CNN-LMthe model can finally get the word embedding, model weights and other parameters. It is important to note that these parameters are not fixed and are constantly updated in subsequent sentence matching tasks. As can be seen from the results of the subsequent experiments, CNN-LM the effect is significant.

Experimental results

The paper uses only one data set, known as the Pi (paraphrase identification) task data set, MSRP. The experimental results are as follows:

As can be seen, CNN-LM the pre-training effect is significant, pre-trained model performance is very strong (but the result is slightly worse than the model he put forward).

<convolutional Neural Network for paraphrase identification>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.