Comparison of Convlstm in two papers

Source: Internet
Author: User
Tags compact comparable

"This is an analysis of the changed network model, the other writing is not comprehensive" 1, "deep learning approach for sentiment analyses of short texts"

Learning long-term dependencies with gradient descent are difcult in neural network language model because of the vanishing Gradients problem

The long-term dependence of learning gradient descent is difficult in the neural network language model, because the gradient vanishing problem

In we experiments, convlstm exploit LSTM as a substitute of pooling layer in CNN to reduce the loss of detailed Local information and capture long term dependencies in sequence of sentences.

In our experiment, convlstm used LSTM as a substitute for the pooling layer (pooling layer) in CNN to reduce the loss of detailed local information and to capture long-term dependencies in sentence sequences.

IMDB, Stanford sentiment Treebank (SSTB)

Experimental results:
Empirical results show that convlstm achieved comparable performances with less parameters on sentiment analysis tasks
The empirical results show that CONVLSTM can achieve comparable performance with less parameters in affective analysis task.

General Practice:
Recent work by [Ten] consists of multi layers of CNN and Max pooling, similar to the architecture proposed by [one] in Compu TER Vision.
multi layers of CNN max pooling

flattened to form a vector, which feeds into a small number of fully connected layers followed by a classifcation layer

Deficiencies of the general approach:

We observed that employing convolutional to capture a long term dependencies requires many layers, because of the locality O f the convolutional and pooling.
We have observed that due to the locality of convolution and pooling, the use of convolution to capture long-term dependencies requires many layers.

As the length of the input grows, this become crucial;

It confrmed that RNN was able to capture long-term dependencies even in the case of a single layer.
It is certain that RNN can capture long-term dependencies even in single-layer situations.

Our work was also inspired fom the fact that recurent layers be able to capture long-term dependences with one single Laye R [14].

The role of the model is:
In our model, we utilize a recurent layer LSTM as substitutes for the pooling layer in order to reduce the loss of Detaile D Local information and capture longterm dependencies. Surprisingly, our model achieved comparable results on the sentiment analysis benchmarks with less number of parameters. We show that it's possibly to use a much smaller model to achieve the same level of classifcation performance when recure NT layer combined with convolutional layer.

In our model, we use the recurent layer lstm as a substitute for the pool layer to reduce detailed local information loss and capture long-term dependencies. Surprisingly, our model has achieved comparable results on two sentiment analysis benchmarks, with a small number of parameters. We show that when the recurrent layer is combined with a convolution layer, a smaller model may be used to achieve the same level of classification performance.

Network structure:

The convolutional layer can extract high-level features fom input sequence efficiently.

The recurent layer LSTM have the ability to remember important information across long stretches of time and also is able t o Capture long-term dependences.

Our work was inspired by [3, 10, 12]

Then we employ a recurent layer as an alterative for pooling layers to efficiently capture long-term dependencies for the Text
Classifcation tasks. Our model achieved a competitive results in multiple benchmarks with less number of parameters.

Biased model, because recent data are more important than previous data,
And the key components may appear anywhere in the sequence, not necessarily in the current first few, not just at the end,
The LSTM model was introduced to overcome these difficulties.

Lstm is a more complex feature that controls the flow of information, prevents gradients from disappearing, and allows recursive layers to capture long-term dependencies more easily.

Form feature map with CNN
And then a layer of LSTM.

We devoted extra time tuning the learing, dropout and the number of units in the convolutional layer,
Since these hyper-parameters have a large impact on the prediction performance.
Took time to adjust the learning rate,dropout, and the number of units in the convolution layer

The number of epochs varies between (5, ten) for both dataset
Number of epochs

We believe the adding a recurrent layer as a substitute to the pooling layer it can effectively reduce the number of T He convolutional layers needed in the model in order to capture long-term dependencies.

Network structure:
Therefore, we consider emerging a convolutional and a recurent layers in one single model convlstm with multiple filters Width (3, 4, 5), Feature maps =, for activation fnctions in the convolutional layer we used rectifed linear (relus) fo R nonlinearity, padding is set to zero.
All elements that would fall outside the matrix is taken to being zero.
To reduce overfitting we applied dropout 0.5 only before the recurent layer

The main contibution in we work, exploits recurent layers as substitutes for the pooling layer;
A key aspect of CNNs is pooling layers, which is typically applied after the convolutional layers to subsample their INP UT, a max operation is the most common-pooling operation.

Our LSTM have input, forget, and Outut gates, hidden state dimension are set to 128

In our model we set the number of filters in the convolutional layers to be the 2x as the dimension of the hidden states I n the recurrent layer, which add 3%-6% relative perforance.
In our model, we set the number of filters in the convolution layer to twice times the number of dimensions hidden in the iteration layer, which increases the relative transmittance of 3%-6%.

Dropout is an effective-to-regularize deep neural networks [3].
We observe applying dropout before and after the recurent layer that decrease the performance of the model 2%-5%, therefor E, we only apply dropout afer the recurrent layer and we set the dropout to 0.5.
Dropout prevent co-adaptation of hidden units by randomly dropping out

Add a layer of dropout

We train the model by minimizing the negative log-likelihood or cross entropy loss.
We train the model by minimizing negative logarithm likelihood or cross entropy loss.

The gradient of the cost fnction are computed with backpropagation through Time (BPTT).
The gradient of the cost function is calculated by the reverse propagation time (BPTT).

The accuracy of the model does not increase with incensement in the number of the convolutional layers, one layer enough t o Peak the model [3], more pooling layers mostly leads to the loss of long term-dependencies [12].
Thus, in our model we ignored the pooling layer in the convolutional network and replaced it with a single LSTM layer to red UCE the loss in local information, single recurent layer is enough to capture long-term dependencies in the model.

The accuracy of the model does not increase with the increase in the number of convolution layers, a layer sufficient to reach the peak of the model [3],more pooling layers mostly leads to loss of long-term dependence [12].
Therefore, in our model, we ignore the aggregation layer in the convolutional network and replace it with a single lstm layer to reduce the loss of local information, and a single recurent layer is sufficient to capture long-term dependencies in the model.

We perform several experiments to offer fair comparison to recent presented deep learning and traditional methods, as show N in Table II and Iii. The For IMDB DataSet, the previous baseline is bag-of-words [+] and Paragraph Vectors [39].

Our convlstm model archives comparable performances and significantly less parameters. We achieved better results compared to convolutional only models;
It likely loses detailed local features because of the number of the pooling layers.
We assumed the proposed model is more compact because of the small number of parameters and less disposed to Ove Rftting.
Hence, it generalizes better when the training size is limited.
It is possible to the filters of the convolutional layer without changing the dimensional in the recurrent layer , which potentially increases the performance 2%-4% without sacrifice of the number of the parameters.
We observed this many factors affect the performance of the deep learing methods, such as the dataset size, vanishing And exploding of the gradient, choosing the best feature extractors and classifiers are still an open.
However, there is no specifc model fit for all types of datasets.

We conducted several experiments to provide a fair comparison with recent advanced learning and traditional methods, as shown in tables II and III. For the IMDB dataset, the previous baseline was the word bag [41] and the paragraph vector [39].

Our convlstm model has very few parameters to archive. We have achieved better results than the convolution model;
Because of the number of shared layers, it may lose detailed local characteristics.
We assume that the proposed model is more compact due to the small number of parameters and the difficulty of processing.
Therefore, when training is limited, it will be better promoted.
You can use more filters in the convolutional layer without changing the dimensions of the recursive layer, which can improve performance by 2%-4% without sacrificing the number of parameters.
We have observed that many factors that affect the performance of deep learning methods, such as data set size, the disappearance and explosion of gradients, and the selection of the best feature extractor and classifier are still an open field of study.
However, there is no specific model that applies to all types of datasets.

In this paper, we proposed a neural language model to overcome the shortcomings in traditional and deep learing methods.
We propose to combine the convolutional and recurrent layer into a single model on top of pre-trained word vectors; To capture long-term dependencies in short texts more efficiently.
We validated the proposed model on SSTB and IMDB Datasets.
We achieved comparable results with less number of convolutional layers compared to the convolutional only architecture, a nd our results confirm that unsupervised pre-trained of word vectors are a significant feature in deep learing for NLP. Also using LSTM as a alterative for the pooling layers on CNN gives the model enhancement to capture long-term Dependenci Es.
It'll be remarkable-for-future-the-architecture-on-other NLP applications such as spam filtering and W EB Search. Using Other variants of recurrent neural network as substitutes for pooling layers are also area worth exploring
In this paper, we propose a neural language model to overcome the shortcomings of traditional and deep learning methods.
We propose to combine convolution and recurrence layers into a model in a pre-trained word vector. Capture long-term dependencies more effectively in short text.
We validated the proposed model on the SSTB and IMDB datasets.
We have achieved comparable results using a low volume convolution layer, and our results confirm that unsupervised pre-trained word vectors are an important feature of deep learning NLP. Also using LSTM as an alternative to the aggregation layer in CNN, you can augment the model to capture long-term correlations.
Future research the application of this architecture to other NLP applications, such as spam filtering and web search, will be of great concern. Using other variants of the recurrent neural network as substitutes for the merge layer is also an area worth exploring.2, "the"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.