7.5 namespace recognition (NER)
The objective is to recognize the named entities mentioned in all words.
It can be divided into two subtasks: determining the NE boundary and determining its type.
NLTK provides a trained classifier that can recognize named entities. If the binary = True parameter is set, the named entity is labeled as NE and has no type label. You can refer to the following code:
>>>sent = nltk.corpus.treebank.tagged_sents()[22>>> nltk.ne_chunk(sent, binary=///////NNPT./NNPMossman/>>>///////NNPT./NNPMossman/
7.6 link Extraction
Once the named entities in the text have been recognized, we can extract the relationships between them.
One of the methods to perform this task is to find all the three tuples in the form of (X, α, Y). We can use regular expressions to extract the relationships we are looking for from Alpha entities. The following example searches for strings containing the word in.
Special regular expressions (?! \ B. + ing \ B) is a Denial-of-Service predicate that allows us to ignore strings such as success in supervising the transition, where in is followed by a dynamic term.
>>>IN = re.compile(r>>> docin nltk.corpus.ieer.parsed_docs( rel nltk.sem.extract_rels(, =,pattern =] [LOC: ] [LOC: ] [LOC: ] [LOC: ] [LOC: ] [LOC: ] [LOC: ] [LOC: ] [LOC: ] [LOC: ] [LOC: ] [LOC: ] [LOC: ]
As shown in the previous article, the Dutch part of CoNLL2002 naming entity corpus contains not only the Named Entity annotation, but also the part of speech annotation. This allows us to design a pattern that is sensitive to these tags, as shown in the following example. The show_clause () method outputs the relationship in the form of entries. The binary relationship symbol is specified as the value of the parameter relsym.
>>>>>>vnv= >>>VAN=>>> docin conll2002.chunked_sents( r nltk.sem.extract_rels(, =, pattern= nltk.sem.show_clause(r,relsym=,,,)