NLP SNOWNLP Practical Use Cases

Source: Internet
Author: User

SNOWNLP is a python-written class library that can easily handle Chinese text content. such as Chinese word segmentation, POS tagging, affective analysis, text categorization, extraction of text keywords, text similarity calculation.

#-*-Coding:utf-8-*-from SNOWNLP import snownlp s = SNOWNLP (' This thing is really awesome ') print (' Chinese participle: ') print (s.words) # [u ' this ', U ' Things ', U ' true ', # u ' very ', U ' Praise '] print () print (' pos: ') print (s.tags) # [(U ' This ', U ' r ') n '), # (U ' true ', U ' d '), (U ' very ', U ' d '), # (U ' praise ', U ' Vg ')] print (' Emotion Analysis: ') print (s.sent  iments) # 0.9769663402895832 positive probability print () print (' Convert to Pinyin: ') #汉转拼音 print (s.pinyin) # [u ' Zhe ', U ' ge ', U ' Dong ', U ' XI ', # u ' zhen ', U ' xin ', U ' hen ', u ' Zan ' Print () print (' Traditional to Simplified: ') s = SNOWNLP (' The Chinese language "is also very common in Taiwan.) #简转繁 print (S.han) # ' Traditional Chinese characters ' is also common in Taiwan.
' Print () Text = ' Natural language processing is an important direction in the field of computer science and artificial intelligence.
It studies various theories and methods that can realize effective communication between human and computer using natural language.
Natural language processing is a science which integrates linguistics, computer science and mathematics.
Therefore, the research in this field will involve natural language, that is, the language that people use everyday, so it is closely related to the study of linguistics, but it has important difference. Natural language processing is not the study of natural language in general, but the development of computer systems which can effectively realize natural language communication, especially the software system.
So it's part of computer science. ' s = SNOWNLP (text) print (' Extract text keywords: ') print (S.keywords (3)) # [' Language ', ' nature ', ' computer '] print () prinT (' Extract text summary: ') print (S.summary (3)) # [' Thus it is part of Computer science ', # ' natural language processing is a branch of linguistics, Computer Science, # Mathematics. Learning ', # ' natural language processing is an important direction in the field of computer science and Artificial Intelligence # "print () print (' Split into sentence: ') print ( s.sentences) print () s = SNOWNLP ([' This article ', ' article ', ' true ', ' good '], [' That article ', ' paper '], [' This ']] print (' word frequency: ') pr Int (S.TF) #词频 print () print (' Reverse file frequency: ') print (S.IDF) #逆向文件频率 print () print (' text similar: ') print (S.sim [' article ']) # [0.37560707629 85226, 0, 0] print (S.sim ([' article ', ' True ']) # [0.7731414846187967, 0, 0]

Output:

Chinese participle:
[' This ', ' thing ', ' sincerity ', ' very ', ' Praise ']

pos annotation:
<zip object at 0x12638b388>

affective analysis:
0.9769551298267365 to

pinyin:
[' Zhe ', ' ge ', ' dong ', ' XI ', ' zhen ', ' xin ', ' hen ', ' zan ']

traditional simplified: "
Traditional Chinese" The term "Traditional Chinese" is also common in Taiwan.

Extract text keywords:
[' Language ', ' nature ', ' computer ']

Extract Text summary:
[' Thus it is part of computer science ', ' natural language processing is an important direction in the field of computer science and Artificial intelligence ', ' natural language processing is a science that integrates linguistics, computer science and Mathematics ']

into sentences:
[' Natural language processing is an important direction in the field of computer science and Artificial intelligence ', ' it studies a variety of theories and methods that enable effective communication between people and computers in natural language, ' and ' natural language processing is a science of linguistics, computer Science and mathematics ', ' so ' ' Research in this field will involve natural language ', ' the language that people use everyday ', ' so it is closely related to the study of linguistics ', ' but there are important differences ', ' natural language processing is not a general study of natural language ', ' but the development of computer systems that can effectively realize natural language communication ', ' Especially the software system ', ' thus it is part of the computer Science '] Word

frequency:
[{' This article ': 1, ' article ': 1, ' true ': 1, ' Good ': 1}, {' That ': 1, ' thesis ': 1}, {' This ': 1}]

reverse file Frequency:
{' This article ': 0.5108256237659907, ' article ': 0.5108256237659907, ' true ': 0.5108256237659907, ' good ': 0.5108256237659907, ' that article ': 0.5108256237659907, ' thesis ': 0.5108256237659907, ' this ': 0.5108256237659907}

text similar:
[0.38657074230939836, 0, 0]
[0.7731414846187967, 0, 0]

About training (participle, POS tagging, affective analysis):

From SNOWNLP import seg
seg.train (' data.txt ')
seg.save (' Seg.marshal ')
# from SNOWNLP import Tag
# Tag.train (' 199801.txt ')
# tag.save (' Tag.marshal ')
# from SNOWNLP import Sentiment
# Sentiment.train (' Neg.txt ', ' Pos.txt ')
# sentiment.save (' Sentiment.marshal ')
PS: The training of the file is stored as Seg.marshal, and then modify the snownlp/seg/__init__.py in the Data_path point to just training good files can
or point to your own training address.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.