Environment: Win 7 + python 3.5.2 + nltk 3.2.1
Chinese participle
Pre-Preparation
Download stanford-segmenter-2015-12-09 (version 2016 Stanford Segmenter is incompatible with NLTK interface), decompression, Copy the Stanford-segmenter-3.6.0.jar,slf4j-api.jar,data folder under the root directory to a folder, and I put them under E:/stanford_jar.
need to modify the NLTK interface in Windows environment
Will your_python_path\lib\site-packages\nltk\tokenize\stanford_segmenter.py the 63rd line of
Self._stanford_jar = ":". Join (
Amended to
Self._stanford_jar = Os.pathsep.join (
Test Code
From nltk.tokenize import Stanfordsegmenter
if __name__ = "__main__":
segmenter = Stanfordsegmenter (path_to_ Jar= "E:/stanford_jar/stanford-segmenter-3.6.0.jar",
path_to_slf4j= "E:/stanford_jar/slf4j-api.jar",
Path_to_sihan_corpora_dict= "E:/stanford_jar/data",
path_to_model= "e:/stanford_jar/data/pku.gz",
path_ to_dict= "e:/stanford_jar/data/dict-chris6.ser.gz") Result
= Segmenter.segment ("What's Your Name")
print (Result) # result is a str, separated by a space word
Run Results
What's your name?
Stanford Segmentation run slowly, and personally feel better using Jieba.
On the basis of analyzing the part of speech of a single word, syntactic analysis tries to analyze the relationship between words and words, and uses this relationship to express the structure of sentences. In fact, the syntactic structure can be divided into two types, one is the phrase structure, the other is the dependency structure. The former extracts syntactic structure according to sentence order, while the latter extracts sentence structure according to the syntactic relationship between words and words.
an analysis based on phrase structure
Pre-Preparation
Download stanford-parser-full-2016-10-31, unzip, Stanford-parser-3.7.0-models.jar decompression to the root directory to get Stanford-parser-3.7.0-models, into the stanford-parser-3.7.0-models\edu\ Stanford\nlp\models\lexparser, copy chinesePCFG.ser.gz to a folder and I put it in E:/stanford_jar. At the same time, I put the Stanford-parser.jar and Stanford-parser-3.7.0-models.jar in the root directory in E:/stanford_jar
Test Code
Import OS from
nltk.parse import Stanford
If __name__ = "__main__":
os.environ[' stanford_parser '] = ' e:/ Stanford_jar/stanford-parser.jar '
os.environ[' stanford_models '] = ' e:/stanford_jar/ Stanford-parser-3.7.0-models.jar '
parser = Stanford. Stanfordparser (model_path= "e:/stanford_jar/chinesepcfg.ser.gz", encoding= "gb2312") Result
= Parser.parse ("You call What's the name ". Split ()) #parsing的句子需要先分好词
print (list result)
Run Results
[Tree (' ROOT ', [' IP '], [Tree (' NP ', [' PN ', [' you ']]], tree (' VP ', [' VV ', [' called ']), tree (' NP ', ['] "], [Tree (' DP '] [ ' DT ', [' What ']]), tree (' NP ', [Tree (' NN ', [' name ']])]])]
an analysis based on dependency relationship
Test Code
From Nltk.parse.stanford import Stanforddependencyparser
if __name__ = "__main__":
os.environ[' Stanford_ PARSER '] = ' E:/stanford_jar/stanford-parser.jar '
os.environ[' stanford_models '] = ' e:/stanford_jar/ Stanford-parser-3.7.0-models.jar '
eng_parser = Stanforddependencyparser (model_path= "E:/stanford_jar/ ChinesePCFG.ser.gz ", encoding=" gb2312 ")
res = list (Eng_parser.parse (" Your Name ". Split ())) for
row in res[0]. Triples ():
print (ROW)
Results
(' Call ', ' VV '), ' nsubj ', (' You ', ' PN ')
(' called ', ' VV '), ' dobj ', (' name ', ' NN ')]
(' name ', ' NN '), ' Det ', (' What ', ' DT ')