"Convolutional neural Network architectures for Matching Natural Language sentences"

Source: Internet
Author: User
Tags ord

Model Structure and principle

1. CNN-based sentence modeling

This paper focuses on the problem of sentence matching (sentence Matching) , but the underlying problem is still the modeling of sentences. First, this paper proposes a network of sentence modeling based on CNN, such as:

The gray part of the figure indicates that for short sentences, the less-than-sufficient portion of the sentence is filled with a value of 0 (Zero Padding). It can be seen that the model solves the different length of sentence input method is to specify a maximum input sentence length, and then the length of the insufficient portion of the 0 value of the fill; the convolution calculation in the graph is the same as the traditional CNN convolution calculation, while pooling is using max-pooling.

    • Analysis of convolution structure

Schematically illustrates the role of convolution structure, the author thinks that the role of convolution is to extract the local semantic combination information from the sentence , and many Feature Map are extracted from various angles, that is, to ensure the diversity of the semantic combination of extraction , the role of pooling is to select a variety of semantic combinations, filtering out some combinations of low confidence (which may be semantically meaningless).

2. CNN-based sentence matching model

Here are two matching models for two sentences based on the previous sentence model.

2.1 Structure I

Model structures such as:

Simply put, the first two separate two sentences are modeled (using the sentence model above), resulting in two identical and fixed-length vectors, the vector represents the sentence after modeling the abstract characteristics of the information; then, the two vectors as a multilayer perceptron (MLP) input, and finally calculate the matching score.

This model is relatively simple, but has a big drawback: two sentences are completely independent in the modeling process, without any interaction, until the final generation of an abstract vector representation of the interactive behavior (together as the input of the next model), so that the sentence in the process of abstract modeling will lose a lot of semantic details, At the same time, the opportunity to calculate semantic interaction between sentences is prematurely lost. Therefore, the second model structure is introduced.

2.2 Structure II

Model structures such as:

As can be seen in the figure, this structure is ahead of the interaction between two sentences.

    • First Layer convolutional layer

In the first layer, take a fixed convolution window firstk1 K1, and then traversesx Sx and SySy All the combined two-dimensional matrix of the convolution, each two-dimensional matrix output a value (this is described as a one-dimensional convolution, Because it is actually a convolution calculation of the vectors of all the words in the combination, the Layer-2 is formed. The following is a formal representation of mathematics:

    • Max-pooling layer after the first layer of convolutional layer

To get the Layer-2, then the 2x2 max-pooling:

    • Subsequent convolution layers

The subsequent convolution layer is the traditional two-dimensional convolution operation, the formal expression is as follows:

    • Pooling layer after two-dimensional convolution results

Unlike the simple Max-pooling method after the first layer, the pooling of the subsequent convolution layer is a dynamic pooling method , which derives from the reference [1].

    • Properties of Structure II
    1. Keep the word order information;
    2. More general, in fact structure I is a special case of Structure II (cancellation of the specified weight parameters);
Experimental section

1. Model Training and parameters

    • Use sort-based custom loss function (ranking-based Loss)
    • BP reverse propagation + random gradient descent;
    • Mini-batch is 100-200, parallelization;
    • To prevent overfitting, model training is stopped prematurely for medium-sized and large datasets, and dropout policies are used for small datasets;
    • Word2vector:50, the English corpus is Wikipedia (~1b words), the Chinese corpus is the micro-BO data (~300m words);
    • Use the Relu function as the activation function;
    • The convolution window is 3-word windows;
    • Use fine tuning;

2. Experimental results

A total of three experiments were done, namely (1) Automatic sentence filling task, (2) The matching of tweets and comments, and (3) synonymous sentence recognition; The results are as follows:

In fact, the results of structure I and Structure II differ little, the structure II is slightly better, and the advantages of structure I and Structure II are larger than other models.

"Convolutional neural Network architectures for Matching Natural Language sentences"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.