"Segmentation & Parsing & Dependency parsing" NLTK Invoke Stanford NLP Toolkit

Last Update:2018-07-24 Source: Internet

Author: User

Tags stanford nlp nltk

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Environment: Win 7 + python 3.5.2 + nltk 3.2.1

Chinese participle

Pre-Preparation
Download stanford-segmenter-2015-12-09 (version 2016 Stanford Segmenter is incompatible with NLTK interface), decompression, Copy the Stanford-segmenter-3.6.0.jar,slf4j-api.jar,data folder under the root directory to a folder, and I put them under E:/stanford_jar.

need to modify the NLTK interface in Windows environment
Will your_python_path\lib\site-packages\nltk\tokenize\stanford_segmenter.py the 63rd line of
Self._stanford_jar = ":". Join (
Amended to
Self._stanford_jar = Os.pathsep.join (

Test Code

From nltk.tokenize import Stanfordsegmenter

if __name__ = "__main__":
    segmenter = Stanfordsegmenter (path_to_ Jar= "E:/stanford_jar/stanford-segmenter-3.6.0.jar",
                                  path_to_slf4j= "E:/stanford_jar/slf4j-api.jar",
                                  Path_to_sihan_corpora_dict= "E:/stanford_jar/data",
                                  path_to_model= "e:/stanford_jar/data/pku.gz",
                                  path_ to_dict= "e:/stanford_jar/data/dict-chris6.ser.gz") Result
    = Segmenter.segment ("What's Your Name")
    print (Result) # result is a str, separated by a space word

Run Results
What's your name?

Stanford Segmentation run slowly, and personally feel better using Jieba.

On the basis of analyzing the part of speech of a single word, syntactic analysis tries to analyze the relationship between words and words, and uses this relationship to express the structure of sentences. In fact, the syntactic structure can be divided into two types, one is the phrase structure, the other is the dependency structure. The former extracts syntactic structure according to sentence order, while the latter extracts sentence structure according to the syntactic relationship between words and words.

an analysis based on phrase structure

Pre-Preparation
Download stanford-parser-full-2016-10-31, unzip, Stanford-parser-3.7.0-models.jar decompression to the root directory to get Stanford-parser-3.7.0-models, into the stanford-parser-3.7.0-models\edu\ Stanford\nlp\models\lexparser, copy chinesePCFG.ser.gz to a folder and I put it in E:/stanford_jar. At the same time, I put the Stanford-parser.jar and Stanford-parser-3.7.0-models.jar in the root directory in E:/stanford_jar

Test Code

Import OS from
nltk.parse import Stanford

If __name__ = "__main__":
    os.environ[' stanford_parser '] = ' e:/ Stanford_jar/stanford-parser.jar '
    os.environ[' stanford_models '] = ' e:/stanford_jar/ Stanford-parser-3.7.0-models.jar '
    parser =  Stanford. Stanfordparser (model_path= "e:/stanford_jar/chinesepcfg.ser.gz", encoding= "gb2312") Result
    = Parser.parse ("You call What's the name ". Split ()) #parsing的句子需要先分好词
    print (list result)

Run Results

[Tree (' ROOT ', [' IP '], [Tree (' NP ', [' PN ', [' you ']]], tree (' VP ', [' VV ', [' called ']), tree (' NP ', ['] "], [Tree (' DP '] [ ' DT ', [' What ']]), tree (' NP ', [Tree (' NN ', [' name ']])]])]

an analysis based on dependency relationship
Test Code

From Nltk.parse.stanford import Stanforddependencyparser

if __name__ = "__main__":
    os.environ[' Stanford_ PARSER '] = ' E:/stanford_jar/stanford-parser.jar '
    os.environ[' stanford_models '] = ' e:/stanford_jar/ Stanford-parser-3.7.0-models.jar '
    eng_parser = Stanforddependencyparser (model_path= "E:/stanford_jar/ ChinesePCFG.ser.gz ", encoding=" gb2312 ")
    res = list (Eng_parser.parse (" Your Name ". Split ())) for
    row in res[0]. Triples ():
        print (ROW)

Results

(' Call ', ' VV '), ' nsubj ', (' You ', ' PN ')
(' called ', ' VV '), ' dobj ', (' name ', ' NN ')]
(' name ', ' NN '), ' Det ', (' What ', ' DT ')

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More