When does the deep learning model in NLP need a tree structure?
Some time ago read Jiwei Li et al and others [1] in EMNLP2015 published the paper "When is the Tree structures necessary for the deep learning of representations?", This paper mainly compares the recursive neural network based on tree structure (Recursive neural networks) and the cyclic neural network based on sequence structure (recurrent neural network), and experiments on 4 kinds of NLP tasks. To discuss when a deep learning model needs a tree structure. I'll discuss when we need tree-structured knowledge by sharing this paper and reviewing some of the relevant information.
1 Syntax Analysis tree
According to different labeling tree libraries, there are two main forms of syntactic analysis tree: 1) The phrase tree (constituent trees) and 2) dependent tree (Dependency trees). Here's a simple example, "My dog likes eating sausage." Using the Stanford Parsing tool for syntactic analysis, the following results can be obtained:
After we visualize it, the phrase tree and the dependency tree are:
In the paper [1] The tree structure of the middle finger is the syntactic tree structure obtained in the syntax analysis of NLP.
2 Comparison Models
In this paper, two major groups of models are compared to the experiment, in particular:
- Standard tree models (standard Recursive neural models), standard sequence models (standard recurrent neural models), Stan Dard bi-directional sequence Models (bi-directional recurrent neural models).
- LSTM Tree models[2], LSTM sequence models vs LSTM bi-directional sequence models.
Each group is three models, tree model, one-way sequence model and two-way sequence model, the specific model you can consult the literature, below I only give the standard model of the structure diagram.
Standard Recursive/tree Models
Standard Recurrent/sequence Models
Bidirectional recurrent Models
3 Experimental Data
This article in the NLP field 4 types of 5 tasks carried out the experiment, the specific experimental data can be consulted from the paper, here I mainly analyze the characteristics of each task, as well as the results of the final experiment:
- Sentiment classification on the Stanford sentiment Treebank
This is a fine-grained emotional classification problem, according to Stanford's syntax tree Library, each node is marked with the emotional type, so the experiment is divided into sentence level and phrase level, from the results, the tree structure for the sentence level a little help, for the phrase level and no effect.
- Binary sentiment classification
This is also an emotional classification problem, unlike the above, it is only two-dollar classification, and only at the sentence level of the label, and each sentence is relatively long. The result of the experiment is that the tree structure does not play any role, possibly because the sentence is longer, and does not have the rich phrase level annotation, causes in the long distance study to lose the learning emotion information.
- Question-answer Matching
This task is a quiz, is to give a description is generally composed of 4~6 sentence, and then according to the description given a phrase-level answer, such as place names, names and so on. In this task, the tree structure also did not play a role.
- Semantic Relation Classification
The task is to give a noun in two sentences and then determine what the semantic relationship is between the two nouns. The tree-structured approach has a significant improvement on this task.
- Discourse parsing
is a classification task, characterized by the short input of the unit, the tree structure has no effect.
Conclusion
Through the above experiment, the author summarizes the following conclusions.
Tree-shaped structure required:
- Tasks that require long-distance semantic dependency information (such as the semantic Relationship classification task above) Semantic relation extraction
- The input is a long sequence, a complex task, and a task with enough callout information in the fragment (such as a sentence-level Stanford Emotional Tree Classification Task), and in addition, the task is first segmented by punctuation, and each sub-fragment uses a bidirectional sequence model. Then the result of using a one-way sequential model is better than the tree structure.
No tree structure required:
- Long sequence and does not have enough fragment labeling tasks (such as the above two-yuan sentiment classification, q-a matching Task)
- Simple tasks (such as phrase-level sentiment classification and discourse analysis tasks), each input fragment is short, and syntactic analysis may not change the order of input.
In addition, the public number of hit's car Wanxiang in hit also published "Does the deep learning model of natural language processing depend on tree structure?" [3], which mentions that "even in the face of complex problems, as long as we are able to get enough training data," we can also eliminate the need for tree-shaped structure.
Through this paper and the car teacher's blog and some related information, whether the syntactic tree structure needs to be worthy of our attention, we should be based on their own tasks and syntactic analysis of the merits and demerits of the judgment, I myself summarized as follows:
What can syntactic analysis bring to us?
- Long-distance semantic dependency relationship
- A sequence fragment containing linguistic knowledge
- Simplifying the core of complex sentence extraction
Disadvantages of syntactic analysis
- Error in self-analysis, introduction of noise
- Simple Task Complication
- Long syntactic analysis time
Main reference documents
[1] J. Li, M.-t. Luong, D. Jurafsky, E. Hovy, when IS is the Tree structures necessary for the deep learning of representations?, EMNLP. (2015) 2304–2314.
[2] K.s. Tai, R. Socher, C.d Manning, improved Semantic representations from tree-structured Long short-term Memory N Etworks, Acl-2015. (2015) 1556–1566.
[3] hit car Wanxiang: is the deep learning model in natural language processing dependent on tree structure?
When does the deep learning model in NLP need a tree structure?