Paper "Recurrent convolutional neural Networks for Text Classification" summary

Source: Internet
Author: User
Tags svm
"Recurrent convolutional neural Networks for Text classification" Paper Source: Lai, S., Xu, L., Liu, K., & Zhao, J. (2015, January). Recurrent convolutional neural Networks for Text classification. In Aaai (vol. 333, pp. 2267-2273).

Original link: http://blog.csdn.net/rxt2012kc/article/details/73742362 1. Abstract

Text categorization is an important foundational task for NLP. Traditional text categorization requires feature engineering and human participation. and deep learning can automatically extract features without the need for human participation. In this paper, the periodic cyclic neural network can reduce the noise more than the convolution neural network, and the most important feature of the sentence is selected by using the maximum pool layer. 2.Introduction

Text categorization is a very important part of many applications. such as web searching, information filtering, and sentiment analysis.

Feature Representation:bag-of-words:where Unigrams, Bigrams, n-grams or some exquisitely designed patterns are Lly extracted as features. Several feature selection methods:frequency, MI, pLSA, LDA Traditional feature expression methods often ignore contextual information and word order information, as well as semantic information. Higher-order N-gram,tree kernels are applied in feature expression, but there are sparse disadvantages that affect accuracy. Word Embedding:word2vec can capture more grammatical and semantic features.

The recursive neural network effect depends entirely on the construction of the text tree, and the time required to build the text tree is O (n^2). And the relationship between the two sentences cannot be expressed through a tree. Therefore not suitable with long sentences or texts. Recurrent neural Network
Benefits: Gets contextual information. Disadvantage: Biased models (biased model), the following words occupy a greater importance. This is not good, because every word can be an important word. So: Thus, it could reduce the effectiveness when it was used to capture the semantics of a whole document, because key compo Nents could appear anywhere in a document rather than on the end. convolutional neural Network (CNN)
Advantages: Unbiased models (unbiased model) are able to obtain the most important features through maximum pooling. Thus, the CNN may better capture the semantic of texts compared to recursive or recurrent neural. Time complexity: O (N) Disadvantage: CNN convolution size is fixed, if the selection of small can easily cause loss of information, if the selection of large, will create a huge parameter space. So: Therefore, it raises a question:can we learn more contextual information than conventional window-based neural network s and represent the semantic of texts more precisely for text classification.

In order to solve the defects of the model above, the paper presents the recurrent convolutional Neural Network (RCNN)

Bi-directional cyclic structure: it is smaller than the traditional window based neural network noise, and can extract the contextual information to the maximum.

We apply a bi-directional recurrent structure, which may introduce considerably less noise compared to a traditional Windo w-based Neural Network, to capture the contextual information to the greatest extent-when possible word learning Entations. Moreover, the model can reserve a larger range of the word ordering when learning representations of texts.

max-pooling Layer Pool layer: Automatic decision which feature occupies a more important role.

We employ a max-pooling layer that automatically judges which features play key roles in-text classification, to capture T He key component in the texts. Time complexity: O (n) our model is compared with the best model at present, and has been tested with remarkable results. 3. Recent research work

Text classification The traditional text categorization focuses on 3 themes: Feature engineering, feature selection and the use of different machine learning models.

Feature engineering: Widely used feature engineering is Bag-of-words

For feature engineering, the most widely used feature is the Bag-of-words feature. In addition, some more complex features have been designed, such as part-of-speech tags, noun phrases (Lewis 1992) and Tre E Kernels (Post and Bergsma 2013).

Feature Selection: Remove noise features: such as removing pause words, using information gain, L1 regular

Feature selection aims at deleting noisy features and improving the classification performance. The most common feature selec-tion method is removing the stop words (e.g., ""). ad-vanced approaches use information gain, mutual informa-tion (Cover and Thomas), or L1 regularization (Ng) t O Select useful Features

Machine learning Model: LR, naive Bayesian, SVM

Machine learning algorithms often use classifiers such as logistic regression (LR), naive Bayes (NB), and support vector m Achine (SVM). However, these methods have the data sparsity problem.

The study of depth learning network and word vector has solved the problem of sparse data.

The study of Word vectors allows us to measure the similarity of two word vectors to characterize the similarity between two words.

With the pre-trained word embeddings, neural networks demonstrate their great performance in many NLP tasks. Socher et al. (2011b) Use semi-supervised recursive Autoen coders to predict the sentiment of a sentence. Socher et al. (2011a) proposed a for paraphrase detection also with recurrent neural network. Socher et al. (2013) introduced recursive neural tensor network to analyse sentiment of phrases and sentences. Mikolov () Uses recurrent neural network to build language models. Kalchbrenner and Blunsom (2013) proposed a novel recurrent network for Di-alogue Act classification. Collobert et al. () Introduce convolutional neural network for semantic role labeling. 4. This model


As shown in Figure 1, first through the 1-layer two-way lstm, the word on the left side of the word forward input to get a word vector, the right side of the word input to get a word vector. Combined with the word vector of the word, a vector of 1 * 3k is generated.

Then through the full connecting layer, the Tanh is nonlinear function and gets y2.

Then after the maximal pool layer, the maximum vector y3 is obtained.

Then through the fully connected layer, SIGMOD is a non-linear function, and the final multiple classification is obtained. 5. Experimental data sets
20newsgroups1 This dataset contains the messages from twenty newsgroups. We Use the Bydate version and select four major categories (comp, Politics, REC, and religion) fol-lowed by Hingmire et a L. (2013). Fudan Set2 the Fudan University document classification set is a Chinese document classification set that consists of C lasses, including art, education, and energy. ACL Anthology Network3 This dataset contains scien-tific documents published by the ACLs and by related organi-zations. It is annotated through Post and Bergsma (2013) with the five most common native languages of the authors:english, Japanese, G Erman, Chinese, and French. Stanford sentiment Treebank4 the dataset contains movie reviews parsed and labeled by Socher et al. (2013). The labels are Very Negative, Negative, Neutral, Positive, and Very Positive. Experimental results
Conclusions

Our model is better able to get contextual information than other models. So the model works better.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.