Recently read some NLTK for natural language processing data, summed up here.
Original published in: http://www.pythontip.com/blog/post/10012/
------------------------------------Talk-------------------------------------------------
NLTK is a powerful third-party library of Python that can easily accomplish many natural language processing (NLP) tasks, including word segmentation, POS tagging, named entity recognition (NER), and syntactic parsing.
NLTK Installation Tutorial: www.pythontip.com/blog/post/10011/
Here's how to use NLTK to quickly complete NLP basic tasks
First, NLTK for participle
Functions to use:
Nltk.sent_tokenize (text) #对文本按照句子进行分割
Nltk.word_tokenize (Sent) #对句子进行分词
Second, NLTK for POS tagging
Functions to use:
Nltk.pos_tag (tokens) #tokens是句子分词后的结果, is also a sentence-level annotation
Third, NLTK for named entity recognition (NER)
Functions to use:
Nltk.ne_chunk (tags) #tags是句子词性标注后的结果, is also the sentence level
In the example above, there are two named entities, one is XI, this should be per, incorrectly identified as GPE, and the other thing China is correctly identified as GPE.
Iv. Syntactic analysis
NLTK There is no good parser, recommend the use of Stanfordparser but NLTK has a good tree class, this class with the list implementation can use the output of stanfordparser to build a Python syntax tree