"Kalchbrenner N, Grefenstette E, Blunsom P." A convolutional Neural Network for modelling sentences "

Last Update:2018-05-26 Source: Internet

Author: User

Tags ord

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Kalchbrenner ' s Paper

Kal's article cited a high number of citations, he proposed a network model called DCNN (Dynamic convolutional neural Networks), in the previous (Kim's Paper) experimental results Section also verified the effectiveness of this model. The subtleties of this model lie in the way of pooling, using a method 动态Pooling called.

Is the model of the sentence semantic modeling process, you can see the bottom through the combination of adjacent word information, gradually upward transfer, the upper layer is also a combination of new phrase information, so that even the distant words in the sentence have interactive behavior (or some kind of semantic connection). Intuitively, this model can extract the important semantic information in the sentence by the combination of words (through pooling), in a sense, the function of the hierarchy feature graph is similar to a parse tree.

DCNN can handle variable-length inputs, and the network consists of two types of layers, one-dimensional convolution layer and a dynamic K-max pool layer (dynamic K-max pooling). Among them, dynamic K-max pooling is a more general form of maximizing pooling. Previously LeCun defined the CNN pooling operation as a non-linear sampling method, returning the maximum value in a heap of numbers, with the following exact words:

The max pooling operator is a non-linear subsampling function that returns the maximum of a set of values (Lucun et al., 1 998).

And the generalization of the K-max pooling mode in this paper is embodied in:

The result of pooling is not to return a maximum value, but to return the K-group maximum, which is a subsequence of the original input;
The parameter K in pooling can be a dynamic function, and the specific value depends on the input or other parameters of the network;

Model Structure and principle

The network structure of DCNN is as follows:

The convolutional layer in the network uses a method called 宽卷积(Wide Convolution) , followed by a dynamic K-max pool layer. The output of the intermediate convolution layer Feature Map will vary depending on the length of the input sentence. Here's a look at the specifics of these operations:

1. Wide convolution

The width of the output of the wide convolution is wider than the traditional convolution operation Feature Map because the convolution window does not need to overwrite all of the input values, or it can be a partial input value (which can be considered as the remaining input value is 0, which is padding 0). As shown in the following:

The graph on the right shows the calculation process of the wide convolution, when the first node is calculated s 1 " > S1 S1, if s 1 " > S1 The S1 node is preceded by four nodes with input values of 0 participating in the convolution (the convolution window is 5). It is obvious that the result of convolution output in narrow sense is a subset of the output result of wide convolution.

2. K-max Pooling

The expression of mathematical formalization is that given akK value, and a sequencep∈Rp P∈RP (wherep≥k p≥k), the k-max pooling first k K maximum values in the sequence p p are selected , and the maximum values retain the order of the original sequence (actually a subsequence of the original sequence).

k-max poolingThe advantage is that it extracts more important information in the sentence (more than one), while preserving their order information (relative position). Also, since the application is only required to be removed at the final convolution layerkk values, so this method allows for different lengths of input (the input length should be greater than Kk). However, for the intermediate convolution layer, the pool parameter kK is not fixed, the specific selection method is described below.

3. Dynamic K-max Pooling

Dynamic K-max pooling operations, where kk is a 输入句子长度 function of and 网络深度 two parameters, as follows:

KL=Max(k t op , ? < Span id= "mathjax-span-91" class= "Mi" >l? Lls ) /span> Kl=max (ktop,? L?lls?)

whichLL represents the number of layers of the current convolution (that is, the first several convolution layers),LL is the number of layers in a network that are a total convolution layer;ktoP The ktop is the topmost convolution layer pooling corresponding to thekA value of K, which is a fixed value. For example, there are three convolutional layers in a network,ktop=3 Ktop=3, the input sentence length is 18, so for the pooling parameter below the first layer of the convolution layerk1= K1=12, and the second layer of convolutional layer isk2=6 K2=6, andk3=ktop=3 k3=ktop=3.

The significance of dynamic K-max pooling is to extract the corresponding number of semantic feature information from different length sentences in order to guarantee the consistency of the subsequent convolution layers.

4. Nonlinear characteristic function

Between the pooling layer and the next convolution layer is the same as the traditional CNN model, by multiplying some weight parameters with some bias parameter.

5. Multiple feature Map

Like the traditional CNN, multiple feature maps are proposed to ensure the diversity of the extracted features.

6. Folding operation (folding)

Before the wide convolution is in the input matrixdxs A calculation operation is performed in each row of the DXS, where Dis the dimension of Word vector andsS is the number of words to enter the sentence. The Folding operation is to consider a link between two adjacent lines, the way is also very simple, is to add two lines of vector, the operation does not increase the number of parameters, but in advance (before the last full-join layer) to consider the characteristics of the matrix line between rows and some kind of association.

Characteristics of the Model

Retains the relative position between the sentence morphemes order information and the words;
The result of wide convolution is an extension of the traditional convolution, in a sense, also an extension of n-gram;
The model does not require any prior knowledge, such as syntactic dependency tree, and the model takes into account the semantic information between the words that are far apart in the sentence;

Experimental section

1. Model Training and parameters

The output layer is a category probability distribution (i.e. Softmax), connected to the second-to-last level;
The cost function is the cross entropy, and the training goal is to minimize the cost function;
regularization of L2;
Optimization method: Mini-batch + gradient-based (using Adagrad update rule, Duchi et al., 2011)

2. Experimental results

Experiments were conducted on three datasets, namely (1) emotional recognition on the film review Data Set, (2) TREC problem classification, and (3) emotional recognition on the Twitter dataset. Results such as:

As can be seen, dcnn performance is very good, almost not inferior to the traditional model, moreover, the advantage of DCNN is that there is no need for any prior information input, and do not need to construct very complex artificial characteristics.

"Kalchbrenner N, Grefenstette E, Blunsom P." A convolutional Neural Network for modelling sentences "

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More