Yin's paper proposes a called Bi-CNN-MI
architecture that Bi-CNN
represents two CNN models that use frames, and a Siamese
MI
multi-granularity interaction feature. Bi-CNN-MI
consists of three parts:
- Sentence analysis Model (CNN-SM)
This part of the model mainly uses the above-mentioned Kal model in 2014, for the sentence itself to extract four kinds of granularity of the characteristics of the expression: words, short ngram, long ngram and sentence granularity. Multi-granularity feature representation is necessary to improve the performance of the model on the one hand and enhance the robustness of the model on the other.
- Sentence interaction calculation model (CNN-IM)
This part of the model is based on the RAE model proposed by Socher in 2011, and makes some simplification, that is, only 22 comparison of extracting features under the same granularity is made.
- LR or Softmax network layer to fit tasks
Model structure
The model presented in this paper is mainly based on KAL model and Socher Rae model, such as:
Through the model diagram, we can see the main idea of the model: on the one hand, using KAL model to extract multiple granularity features, on the other hand, taking the idea of Rae model, the 22 similarity of the extracted features is calculated, the result of the calculation is dynamic pooling
further extracted by the way of a few features, then the various levels of The pooling
calculated results are divided into a set of vectors, which are connected with the LR (or Softmax) layer in an all-connected way, thus adapting the synonym detection task itself.
The detailed calculation details of this model are no longer mentioned, and interested readers can go directly to the paper. In addition to proposing this model structure, one of the highlights of the paper is the use of a language-like model to CNN-LM
pre-train the models in the above-mentioned CNN section to determine the parameters of the model in advance. CNN-LM
network structure such as:
CNN-LM
The training of the model presupposes the use of the final experimental data set, i.e. MSRP, and the author adds 100,000 additional English sentence corpora due to the small size of the MSRP data. CNN-LM
the model can finally get the word embedding, model weights and other parameters. It is important to note that these parameters are not fixed and are constantly updated in subsequent sentence matching tasks. As can be seen from the results of the subsequent experiments, CNN-LM
the effect is significant.
Experimental results
The paper uses only one data set, known as the Pi (paraphrase identification) task data set, MSRP. The experimental results are as follows:
As can be seen, CNN-LM
the pre-training effect is significant, pre-trained model performance is very strong (but the result is slightly worse than the model he put forward).
<convolutional Neural Network for paraphrase identification>