Aspect level sentiment classification with Deep Memory Network
Paper Source: Tang, D., Qin, B., & Liu, T. (2016). Aspect level sentiment classification with deep memory network. ArXiv preprint arxiv:1605.08900.
Original link: http://blog.csdn.net/rxt2012kc/article/details/73770408 advantages
Neural models are of growing interest for their capacity to learn text representation from data without careful g of features, and to capture semantic relations between aspect and context words into a more scalable way than feature base D SVM. Disadvantage
Despite these advantages, conventional neural models-like long short-term memory (lstm) (Tang et al., 2015a) Capture Conte XT information in a implicit way, and are incapable of explicitly exhibiting the context important of a clues.
Standard LSTM works in a sequential way and manipulates each of the context word with the same operation, so that it cannot expli Citly reveal the importance of each context word. cross-entropy
As every component is differentiable, the entire model could am efficiently trained end-to-end with gradient descent, whe Re the loss function is the cross-entropy error of sentiment classification. The target word is a single word, if multiple words are processed as follows:
For the case where aspect are multi word expression like ' battery life ', aspect represen-tation is a average of its const ituting Word vectors (Sun et al., 2015). DataSet Laptop and Restaurant datasets
We apply the proposed approach to laptop and restaurant datasets from Semeval 2014 (Pontiki et al., 2014). Steps Input
Given a sentence s = {w1, w2, ..., WI, ... wn} and the aspect Word WI, we map each word to its em-bedding vector. These are word vectors are separated into the two parts, aspect representation and context rep-resentation. If aspect is a single word like "food" or "service", aspect representation is the embedding of aspect word.
Context Word Vectors {e1, e2 ... ei 1, ei+1 ... en} are stacked and regarded as the external memory m 2 Rd⇥ (n 1), where n i s the sentence length. Step1
In the the ' The ' Computational layer (Hop 1), we regard aspect vector as the input to adaptively select important evidences fr Om memory m through attention layer.
The output of attention layer and the linear transformation of aspect Vector2 are summed and the result is considered as T He input to next layer (Hop 2).
It is helpful to this parameters of attention and linear layers are shared in different hops. There-fore, the model with one layer and the model with nine layers have the same number of parameters. Attention Model
The basic idea of attention mechanism was that it assigns a weight/importance to all lower position when computing a Uppe R level representation (Bahdanau et al., 2015).
In this work, we use attention model to compute the representation of a sentence with Re-gard to a aspect.
Furthermore, the importance of a word should be different if we are on different aspect. Let us again take the example's "great food but the service was dreadful!". The context word "great" are more important than "dreadful" for aspect "food". On the contrary, "dreadful" are more important than "great" for aspect "service".
Calculate the weight of each word, through a gi = Tanh (watt[mi Vaspect] + batt), GI 1*1 value, finally get [g1,g2,g3,g4,gk],1*k vector, and then through a softmax, calculate the weight of each word, Then multiply each memory and then add it, and finally get a d*1 matrix, as the output of attention model. Location Attention
Such location information is helpful for a attention model because intuitively a context word closer to the aspect Be more important than a farther one.
In this work, we define the location's a context word as its absolute distance with the aspect in the original sentence s Equence3.
VI =1 li/n, Li represents the position of the element, n represents the length of the sentence. The need for multiple hops
Multiple computational layers allow the deep memory network to learn representations of-text with multiple levels of ABSTR Action. Each layer/hop retrieves important context words, and transforms the representation in previous level into a representatio N at a higher, slightly more abstract level. With the composition of enough such transformations, very complex functions the sentence representation towards a aspect C An IS learned. Cross Entropy
The model is trained in a supervised manner by minimizing the cross entropy error of sentiment classification, share T He same parameters
It is helpful to this parameters of attention and linear layers are shared in different hops. There-fore, the model with one layer and the model with nine layers have the same number of parameters.