"Segmentation & Parsing & Dependency parsing" NLTK Invoke Stanford NLP Toolkit

Source: Internet
Author: User
Tags stanford nlp nltk

Environment: Win 7 + python 3.5.2 + nltk 3.2.1

Chinese participle

Pre-Preparation
Download stanford-segmenter-2015-12-09 (version 2016 Stanford Segmenter is incompatible with NLTK interface), decompression, Copy the Stanford-segmenter-3.6.0.jar,slf4j-api.jar,data folder under the root directory to a folder, and I put them under E:/stanford_jar.

need to modify the NLTK interface in Windows environment
Will your_python_path\lib\site-packages\nltk\tokenize\stanford_segmenter.py the 63rd line of
Self._stanford_jar = ":". Join (
Amended to
Self._stanford_jar = Os.pathsep.join (

Test Code

From nltk.tokenize import Stanfordsegmenter

if __name__ = "__main__":
    segmenter = Stanfordsegmenter (path_to_ Jar= "E:/stanford_jar/stanford-segmenter-3.6.0.jar",
                                  path_to_slf4j= "E:/stanford_jar/slf4j-api.jar",
                                  Path_to_sihan_corpora_dict= "E:/stanford_jar/data",
                                  path_to_model= "e:/stanford_jar/data/pku.gz",
                                  path_ to_dict= "e:/stanford_jar/data/dict-chris6.ser.gz") Result
    = Segmenter.segment ("What's Your Name")
    print (Result) # result is a str, separated by a space word

Run Results
What's your name?

Stanford Segmentation run slowly, and personally feel better using Jieba.

On the basis of analyzing the part of speech of a single word, syntactic analysis tries to analyze the relationship between words and words, and uses this relationship to express the structure of sentences. In fact, the syntactic structure can be divided into two types, one is the phrase structure, the other is the dependency structure. The former extracts syntactic structure according to sentence order, while the latter extracts sentence structure according to the syntactic relationship between words and words.

an analysis based on phrase structure

Pre-Preparation
Download stanford-parser-full-2016-10-31, unzip, Stanford-parser-3.7.0-models.jar decompression to the root directory to get Stanford-parser-3.7.0-models, into the stanford-parser-3.7.0-models\edu\ Stanford\nlp\models\lexparser, copy chinesePCFG.ser.gz to a folder and I put it in E:/stanford_jar. At the same time, I put the Stanford-parser.jar and Stanford-parser-3.7.0-models.jar in the root directory in E:/stanford_jar

Test Code

Import OS from
nltk.parse import Stanford

If __name__ = "__main__":
    os.environ[' stanford_parser '] = ' e:/ Stanford_jar/stanford-parser.jar '
    os.environ[' stanford_models '] = ' e:/stanford_jar/ Stanford-parser-3.7.0-models.jar '
    parser =  Stanford. Stanfordparser (model_path= "e:/stanford_jar/chinesepcfg.ser.gz", encoding= "gb2312") Result
    = Parser.parse ("You call What's the name ". Split ()) #parsing的句子需要先分好词
    print (list result)

Run Results

[Tree (' ROOT ', [' IP '], [Tree (' NP ', [' PN ', [' you ']]], tree (' VP ', [' VV ', [' called ']), tree (' NP ', ['] "], [Tree (' DP '] [ ' DT ', [' What ']]), tree (' NP ', [Tree (' NN ', [' name ']])]])]

an analysis based on dependency relationship
Test Code

From Nltk.parse.stanford import Stanforddependencyparser

if __name__ = "__main__":
    os.environ[' Stanford_ PARSER '] = ' E:/stanford_jar/stanford-parser.jar '
    os.environ[' stanford_models '] = ' e:/stanford_jar/ Stanford-parser-3.7.0-models.jar '
    eng_parser = Stanforddependencyparser (model_path= "E:/stanford_jar/ ChinesePCFG.ser.gz ", encoding=" gb2312 ")
    res = list (Eng_parser.parse (" Your Name ". Split ())) for
    row in res[0]. Triples ():
        print (ROW)

Results

(' Call ', ' VV '), ' nsubj ', (' You ', ' PN ')
(' called ', ' VV '), ' dobj ', (' name ', ' NN ')]
(' name ', ' NN '), ' Det ', (' What ', ' DT ')

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.