The application of depth learning in text categorization _ depth Learning

Source: Internet
Author: User
Tags mixed svm
Introduction

Text classification This is a very common and widely used topic in the field of NLP, and there are quite a lot of research results, such as the application of a wide range of rules based SVM classifier, and the naïve Bayesian method of SVM classifier, of course, there are maximum entropy classifier, Based on conditional random field, the classification method of dependency tree is constructed, and the common BP Neural Network classification method is of course. In the traditional text-classified word bag model, the text vector dimension can be used in the process of translating text into a text vector, and there are some other classification methods which compress the dimension. However, the above methods, because in the course of training lost the word order information, in the text classification process, the effect is not necessarily satisfactory. This article is mainly in the study of several in-depth studies in the text classification of the research paper "1,2,3,4,5" and Bowen "6" after the article mentioned in the method to do a simple summary and review. Background

This posting design of the depth of learning content mainly refers to Rnn and CNN, and in the paper "1,2,3,4,5" is mainly concerned with the use of CNN Text modeling and classification, the paper "3" refers to the use of RNN training text vector method, so for the sake of simplicity of description, I used direct depth learning to represent the classification method used in this article.
The main reason why CNN can be widely used in text categorization is simple, because CNN and the N-gram model are similar, and the filter window in CNN can actually be seen as a N-gram method, but because CNN uses the convolution layer and the pooling layer, So that CNN can reduce the number of training parameters, but also can extract the text of the higher level of information. And RNN is more used in text modeling and machine translation, directly used in text categorization seems to be not a lot of appearance. CNN for text categorization

Here we intend to use the paper "1,2" to illustrate CNN's application in text categorization. The two articles were published in very close proximity to the 2014 articles.

First, take a look at the specific method of the paper "1" (convolutional neural Network for sentence classification)

Look at the author's CNN structure:

Explain the image above:
There are two channel in the leftmost output layer, each channel is a two-dimensional matrix, The length of the column of the matrix equals the length of the statement sentence (that is, the number of words in the sentence, which makes each sentence to be sorted by the same length), and the row vectors of the matrices represent the vector form of each word, The author uses the Word2vec tool to initialize, which means every word has been embedding. Two channel are the same when initialized, and two channel are used because two channel have different purposes, one of which is static, that is, after given the embedding, the value is not changed, and the other channel is Non-static, which indicates that the embedding vector is a parameter, is also required in derivation. The purpose of using two channel is to consider: first, if only static is used, there may be inconsistency between the use of the Word2vec training corpus and the experimental corpus in the experiment, resulting in embedding bias; second, if only the use of a unilateral non-static vector , its initialization has an effect on the result and the speed of convergence. Therefore, the use of mixed channel can make the above two kinds of problems get "medium and".
After the input layer is the convolution layer, the above figure for example, the top of the filter shape is 3*6, that is, for this sentence: "Wait for the vedio and do n ' t rent it", this filter every three words to do a convolution operation, For sentences of length 8, after the convolution operation of this filter, a 7*1 output is generated. Of course, the number of filter in the convolution layer and the shape of the filter can be changed, the principle is the same.
The next layer is the pooling layer, this paper uses the max-pooling, that is, the above 7*1 of the convolution layer output will be pooling into a 1*1 value, with n filter will produce n 1*1 values, These n values will be used for the subsequent full join layer. The
is followed by a fully connected output layer, the output layer output number corresponds to the number of text category, the above n-pooling layer output is all connected to the output layer, the output layer uses the Softmax excitation function.

As you can see from the above description, CNN for the classification of the idea is very clear, it is not difficult to achieve, the parameters of the training I do not mention, the results of the experiment I will be in the next section to give the code and results and then look at the paper "2" (Effective use Word order for Text Discussion on CNN classification method by categorization with convolutional neural Networks
With the above foundation, understand the thesis 2 point of view also becomes easy, in fact, the thesis "2" in the text vector preprocessing process or appear slightly rough, directly using the One-hot model, but has made some improvements. The main difference is in the expression of the word vector, in this paper, the author directly uses the one-hot word vector model, which is called the SEQ-CNN model, which obviously brings about a dramatic increase in dimensions, and then the author proposes an improved type: BOW-CNN model, In fact, a few consecutive words in the vicinity are built into a word vector, the difference is as follows:

SEQ-CNN model

BOW-CNN model
The rest of the training process is similar to 1, so it is not mentioned. CNN and RNN's hybrid use of CNN and RNN for text vector training
The thesis "4" The view is quite unique, the author does not use CNN or RNN to do the classification model, but uses the CNN and the RNN to train the text the vector, finally is uses the ordinary Ann as the classifier, here mainly says the author's process which produces the text vector
First, let's look at how the CNN model generates text vectors.
For sentence of length L, each word is an M-dimensional word vector, and for a filter it operates as follows:

The above picture is the shape of the filter is 3*m, in the convolution layer, can get c1c2. Cl-2, and then max-pooling the operation and finally gets a numerical
Using N filter to repeat the above operation, we can get an n-dimensional vector s, which is the text vector we get.
Then look at how the RNN model produces the text vector.
The author uses the variant lstm of RNN, whose structure is as follows:

The X1-XL is also the vector of the M dimension, and the H1-HL is a one-dimensional vector with a dimension of N, and the pooling layer on the last side is max-pooling or mean-pooling
After the text vector can be sent into the Ann Neural network classifier for classification training, the training process does not mention the CNN and RNN mixed model use
The paper "3" (a c-lstm neural network for text classification) mentions a new model, that is, to mix CNN and RNN As a text classifier, the paper is 2015, I think the view is still relatively fresh, So come out and talk about it.
The model is as follows:

The previous convolution layer is the same as the previous article mentioned, that is, for each filter, after convolution operations from the sentence embedding matrix, the feature map is obtained, and then the emphasis comes from the feature map layer to the window feature Sequence layer, put the same color in a sequence, and then ranked down, in fact, very good to see, in the Window feature sequence layer of each sequence, in fact, and the original sentence sequence is corresponding to maintain the original relative order, It's just a convolution operation in the middle.

The sequence vectors of the window feature sequence layer are input inputs to the next layer of the LSTM network, which uses the hidden layer of the last layer to output h as output of the network. Then there is the problem of training the lstm parameters. Experiment

The experiment in this article is based on Bowen implementing a CNN for-TEXT classification in TensorFlow to repeat the experiment. This paper is aimed at the experiment of "1".

Experimental data: Movie Review data from Rotten Tomatoes
Data Description: Movie reviews: 5331 positive comments and 5331 negative comments
Experimental tool: Google's tensor flow framework
Test set: 1000
Training set: The rest of the data

Experimental results:

Description: The red line is the training set, the Blue line represents the test set, the accuracy of the test set is best about 76%, which is almost the same as the data in the paper.

Deep learning in these years special fire, in the field of NLP, deep learning is also the leader in the wind, especially in machine translation, speech recognition and other fields quite a bit, this is also summed up a few text on the classification of this paper, shun do not learn Google's in-depth learning tensor flow framework, There's a little bit of a harvest. Reference documents

[1]kim Y. convolutional neural Networks for sentence classification[j]. Eprint ARXIV, 2014.
[2]johnson R, Zhang T. Effective use the Word order as Text categorization with convolutional neural networks[j]. Eprint ARXIV, 2014.
[3]zhou C, Sun C, Liu Z, et al. A c-lstm Neural Network for Text Classification[j]. Computer Science, 2015.
[4]ji Young Lee, Franck Dernoncourt. Sequential Short-text classification with recurrent and convolutional neural]. 2016.
[5]kalchbrenner N, Grefenstette E, Blunsom P. A convolutional Neural Network for modelling sentences[j]. Eprint ARXIV, 2014, 1.
[6] implementing A CNN for TEXT classification in TensorFlow

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.