Reprint: http://www.cnblogs.com/DjangoBlog/p/6782872.html
The term "Joint learning" (Joint learning) is not a recent term, and in the field of natural language processing, researchers have long used a joint model based on traditional machine learning (Joint model) to learn about some of the closely related natural language processing tasks. For example, entity recognition and entity standardization Joint learning, Word segmentation and POS tagging joint learning and so on. Recently, the researchers studied the entity recognition and relationship extraction based on the neural network approach, and I read about some of the relevant work to share with you. (This article cited some paper authors Suncong Zheng ppt report) 1 Introduction
The task of this article is to extract entities from unstructured text and relationships between entities (entity 1-Relationship-entity 2, triples), where the relationship is a predefined relationship type. For example, the following illustration
At present, there are two kinds of methods, one is the use of pipelining methods (Pipelined method) to extract: Enter a sentence, first named entity recognition, and then the identified entities 22 combinations, then the relationship classification, and finally the existence of the entity Relationship ternary group as input. The shortcomings of the pipeline method are: 1) error propagation, the error of the Entity Identification module will affect the following relationship classification performance; 2) ignores the relationships that exist between the two subtasks, such as the example in the figure, if there is a country-president relationship, Then we can know that the former entity must belong to the location type, the latter entity belongs to the person type, the Pipelining method cannot take advantage of such information. 3) Generated unnecessary redundancy information, because the identified entities are 22 pairs, and then the relationship classification, those who do not have the relationship of the entity will bring unnecessary information, increase the error rate.
The ideal joint learning should be as follows: input a sentence, through entity recognition and relationship extraction joint model, directly get the related entity ternary group. This can overcome the drawbacks of the above pipelining approach, but there may be more complex structures.
2 Joint Learning
My main concern here is based on the neural network method of joint learning, I have the current work is divided into two major categories: 1) parameter sharing (Parameter sharing) and 2) labeling strategy (Tagging Scheme). Mainly involved in some of the following related work.
2.1 parameter Sharing
The paper "Joint Entity and Relation Extraction Based on A Hybrid Neural Network", Zheng and others use the shared neural network underlying expression for joint learning. Specifically, input sentences are encoded by using the shared Word embedding layer and then the bidirectional lstm layer. A lstm is then used to name entity recognition (NER) and a CNN for relational classification (RC). Compared with the current mainstream NER model BILSTM-CRF model, the previous prediction tag is embedding and then introduced into the decoding to replace the CRF layer to solve the label dependency problem in NER. When you do a relationship classification, you need to first pair the entities based on the results of the NER predictions, and then use a CNN to classify the text between the entities. Therefore, the model is mainly through the bottom of the model parameter sharing, in the training of two tasks will be through the back propagation algorithm to update the shared parameters to achieve the dependency between the two sub-tasks.
The paper "End-to-end Relation Extraction using Lstms on sequences and Tree structures" is similar to the idea of joint learning through parameter sharing. It's just that they differ in the decoding model of NER and RC. This paper Miwa and other people are also through the parameter sharing, NER uses an NN to decode, the RC is added to the dependency information, based on the minimum path of the dependent tree using a bilstm for the relationship classification.
According to the experiments of these two papers, using parameter sharing to carry out joint learning is better than the Pipelining method. The F value increases by about 1% on their mission, which is a simple and common method. The paper "A Neural Joint Model for entity and Relation Extraction from biomedical text" uses the same idea in the biomedical text of the Entity Relationship Extraction task. 2.2 Labeling Policies
But we can see that the method of parameter sharing actually has two sub-tasks, but there is an interaction between the two subtasks through parameter sharing. And in the training time still need to carry on the NER, then according to the prediction information of NER 22 match to carry on the relation classification. There will still be non-related entities to this redundant information. For such motives, Zheng and others in the paper "Joint Extraction of entities and relations Based on a novel Tagging Scheme" proposed a new labeling strategy for the relationship extraction, the paper published in 2 017 ACL, and was selected for outstanding Paper.
By proposing a new labeling strategy, they have completely changed the relationship extraction that involves sequence labeling tasks and classification tasks into a sequence labeling problem. Then the relationship Entity ternary group is obtained directly through an end-to-end neural network model.
The new labeling strategy they put forward is mainly composed of the following three parts: 1) Entity morphemes Position Information {B (entity start), I (Entity inside), E (end of entity), S (single entity)};2) relationship type information {based on predefined relationship type};3) entity role Information { 1 (Entity 1), 2 (Entity 2)}. Note that the words in this ternary group are "O" as long as they are not entity relationships.
Based on the label sequence, entities of the same relationship type are combined into a ternary group as the final result, and if a sentence contains more than one relationship of the same type, then the nearest principle is used for pairing. Currently, this set of labels does not support the case of entity relationship overlap.
The task then becomes a sequence labeling problem, as the overall model is shown below. First, a bilstm is used to encode and then decoded using the lstm mentioned in the parameter sharing.
Unlike classic models, they use a target function with a bias. When the label is "O", is the normal objective function, when the label is not "O", that is involved in the relationship entity label, the effect of the label by α. The experimental results show that the objective function with bias can predict the relationship between the entities more accurately.
3 Summary
The joint learning of entity recognition and relationship extraction based on neural network is mainly composed of two kinds of methods. The method of parameter sharing is simple and easy to implement, and it is widely used in multi-task learning. Zheng and other people put forward a new labeling strategy, although there are still some problems (such as the inability to identify overlapping entity relationships), but gave a new way of thinking, really did two sub-tasks merged into a sequence labeling problem, In this set of labeling strategy can also make more improvements and development to further improve the end-to-end relationship extraction task.
Reference documents
[1] S. Zheng, Y. Hao, D. Lu, H. Bao, J. Xu, H. Hao, et al, Joint Entity and Relation Extraction Based on A Hybrid Neural Network, neurocomputing. (2017) 1–8.
[2] M. Miwa, M. Bansal, End-to-end Relation Extraction using Lstms on sequences and Tree structures, ACL, (2016).
[3] F. Li, M. Zhang, G. Fu, D. Ji, A neural Joint Model for Entity and Relation Extraction from biomedical Text, BMC Bioin Formatics. 18 (2017).
[4] S. Zheng, F. Wang, H. Bao, Y Hao, p. Zhou, B. Xu, Joint Extraction of entities and relations Based on a novel Tagging Scheme, Acl. (2017).