Paper "Aspect level sentiment classification with Deep Memory Network" summary

Source: Internet
Author: User
Aspect level sentiment classification with Deep Memory Network Paper Source: Tang, D., Qin, B., & Liu, T. (2016). Aspect level sentiment classification with deep memory network. ArXiv preprint arxiv:1605.08900.

Original link: http://blog.csdn.net/rxt2012kc/article/details/73770408 advantages

Neural models are of growing interest for their capacity to learn text representation from data without careful g of features, and to capture semantic relations between aspect and context words into a more scalable way than feature base D SVM. Disadvantage

Despite these advantages, conventional neural models-like long short-term memory (lstm) (Tang et al., 2015a) Capture Conte XT information in a implicit way, and are incapable of explicitly exhibiting the context important of a clues.

Standard LSTM works in a sequential way and manipulates each of the context word with the same operation, so that it cannot expli Citly reveal the importance of each context word. cross-entropy

As every component is differentiable, the entire model could am efficiently trained end-to-end with gradient descent, whe Re the loss function is the cross-entropy error of sentiment classification. The target word is a single word, if multiple words are processed as follows:

For the case where aspect are multi word expression like ' battery life ', aspect represen-tation is a average of its const ituting Word vectors (Sun et al., 2015). DataSet Laptop and Restaurant datasets

We apply the proposed approach to laptop and restaurant datasets from Semeval 2014 (Pontiki et al., 2014). Steps Input

Given a sentence s = {w1, w2, ..., WI, ... wn} and the aspect Word WI, we map each word to its em-bedding vector. These are word vectors are separated into the two parts, aspect representation and context rep-resentation. If aspect is a single word like "food" or "service", aspect representation is the embedding of aspect word.

Context Word Vectors {e1, e2 ... ei 1, ei+1 ... en} are stacked and regarded as the external memory m 2 Rd⇥ (n 1), where n i s the sentence length. Step1

In the the ' The ' Computational layer (Hop 1), we regard aspect vector as the input to adaptively select important evidences fr Om memory m through attention layer.

The output of attention layer and the linear transformation of aspect Vector2 are summed and the result is considered as T He input to next layer (Hop 2).

It is helpful to this parameters of attention and linear layers are shared in different hops. There-fore, the model with one layer and the model with nine layers have the same number of parameters. Attention Model

The basic idea of attention mechanism was that it assigns a weight/importance to all lower position when computing a Uppe R level representation (Bahdanau et al., 2015).

In this work, we use attention model to compute the representation of a sentence with Re-gard to a aspect.

Furthermore, the importance of a word should be different if we are on different aspect. Let us again take the example's "great food but the service was dreadful!". The context word "great" are more important than "dreadful" for aspect "food". On the contrary, "dreadful" are more important than "great" for aspect "service".

Calculate the weight of each word, through a gi = Tanh (watt[mi Vaspect] + batt), GI 1*1 value, finally get [g1,g2,g3,g4,gk],1*k vector, and then through a softmax, calculate the weight of each word, Then multiply each memory and then add it, and finally get a d*1 matrix, as the output of attention model. Location Attention

Such location information is helpful for a attention model because intuitively a context word closer to the aspect Be more important than a farther one.

In this work, we define the location's a context word as its absolute distance with the aspect in the original sentence s Equence3.

VI =1 li/n, Li represents the position of the element, n represents the length of the sentence. The need for multiple hops

Multiple computational layers allow the deep memory network to learn representations of-text with multiple levels of ABSTR Action. Each layer/hop retrieves important context words, and transforms the representation in previous level into a representatio N at a higher, slightly more abstract level. With the composition of enough such transformations, very complex functions the sentence representation towards a aspect C An IS learned. Cross Entropy

The model is trained in a supervised manner by minimizing the cross entropy error of sentiment classification, share T He same parameters

It is helpful to this parameters of attention and linear layers are shared in different hops. There-fore, the model with one layer and the model with nine layers have the same number of parameters.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.