This article from the blog Vander, csdn blog, if you need to reprint please indicate the source, respect for original thank you Blog address: http://blog.csdn.net/l540675759/article/details/61236376 What is a word cloud?
The word cloud is also called the text cloud, is to appear the high frequency in the text data "The key w
Dictionary data type (other programming languages may be called associated arrays or hash arrays)
Index linked list VS Dictionary (omitted)
Python dictionary
# Initializing an empty dictionary
Pos = {}
# Other dictionary usage: pos. keys0, pos. values (), pos. items ()
# Define a non-empty dictionary
>>>pos= {:, : , : , : >>>pos= dict(colorless=,ideas=, sleep=, furiously=)
The first method is usually used. Note that a dictionary key cannot be m
ListSeg_list = Jieba.cut (" long Live The People's Republic of China!" # default precision mode print# This returns the generator = jieba.lcut (" Long Live the People's Republic! print= Jieba.lcut_for_search ("Long livethe People's Republic of China!" ")print(seg_list)Results:[' People's Republic ', ' Hooray ', '! ‘][' Chinese ', ' Chinese ', ' People ', ' republic ', ' republics ', ' People's Republic ', ' Long live ', '! ‘]"Custom word breaker
specific code.BeautifulSoup crawl to talk aboutF12 view content can be found to say in Feed_wrap this, Inside theIn the tag array, the specific content of each wordClass= "BD"The label.At this point QQ said has climbed down, and saved in the Qq_word fileNext, create a word cloudWord CloudUse Wordcloud package to generate word cloud, pip install WordcloudHere can also use Jieba participle, I did not use, be
question No. 0004: any plain text file in English, counting the number of occurrences of the word.Idea: match the words and numbers of the response with regular expressions, then let counter calculate the word frequency of the words, and then use the Most_common method to return a list of tuples that contain the word a
code, the text Word segmentation, code implementation as shown below.10, after the completion of the program, get the Moment_outputs.txt file, the content as shown, you can clearly see the word segmentation situation. The red part is the process of running the program.11, continue to write code, the frequency of the statistical summary, the code implementation a
code, the text Word segmentation, code implementation as shown below.10, after the completion of the program, get the Moment_outputs.txt file, the content as shown, you can clearly see the word segmentation situation. The red part is the process of running the program.11, continue to write code, the frequency of the statistical summary, the code implementation a
数
Results:
Select an English text below and count the number of occurrences of the word, returning the number of occurrences of a wordWhat a python line of code can accomplish, don't use two lines
#coding =utf-8
Import Collections
import OS with
open (' Str.txt ') as File1: #打开文本文件
str1=file1.read (). Split (') #将文章按照空格划分开
Print original text: \ n%s "% str1
print" \ n the number of occurrences of
At present I often use the participle has stuttering participle, nlpir participle and so onRecently in the use of stuttering participle, a little bit of recommendation, or good use.First, stuttering participle introductionUsing stuttering participle to Chinese word segmentation, the basic realization principle has three:
Efficient word-map scanning based on trie tree structure to generate a directe
(TXT):TenWordlist_jieba = jieba.cut (TXT)#split text, return to list OneTxt_jieba =" ". Join (Wordlist_jieba)#stitching a list into a string that breaks with spaces A returnTxt_jieba - -Stopwords = {'these'70A'those who'70A'because'70A'so': 0}#noise word theMask_pic = Numpy.array (Image.open (Os.path.join (Cur_path,'love.jpg'))) - -With open (Os.path.join (Cur_path,'Select the day note. txt') ) as FP: -TXT =Fp.read () +TXT =Chinese_jieba (TXT)
, precision modePrintu "exact mode participle:"+"/ ". Join (SEG)) seg = Jieba.cut ("He came to NetEase hang research building.")# Default is exact modePrint", ". Join (SEG)) seg = Jieba.cut_for_search ("Xiaoming graduated from the Institute of Chinese Academy of Sciences, after studying at Kyoto University in Japan")# search engine modePrint", ". Join (SEG))Add a custom dictionaryusage : jieba.load_userdict (file_name)file_name path to a file class object or a custom dictionarydictionary format
Trie Tree, also known as the Dictionary tree, prefix tree. Can be used for "predictive text" and "Autocompletion", can also be used for statistical frequency (side insert trie tree Edge Update or add word frequency).In computer science,Trie, also known as a prefix tree or a dictionary tree , is an ordered tree that holds associative arrays, where the keys are usu
At present I often use the participle has stuttering participle, nlpir participle and so on
Recently in the use of stuttering participle, a little bit of recommendation, or good use.
First, stuttering participle introduction
Using stuttering participle to Chinese word segmentation, the basic realization principle has three:
Efficient word-map scanning based on trie tree structure to generate a directed acy
The number of occurrences of each word in the. Python statistics text: #coding =utf-8__author__ = ' zcg ' Import collectionsimport oswith open (' Abc.txt ') as File1: #打开文本文件 Str1=file1.read (). Split (") #将文章按照空格划分开print" original text: \ n%s "% str1print" \ n the number of occurrences of each word: \ n%s "% collections. Cou
The final review compared busy time to write Scrapy framework use, today describes how to use Python to generate word cloud, although there are many word cloud generation tools on the Web, but their own python to write is not a more fulfilling sense.
Today to generate is inspirational song
segmentation situation. The red part is the process of running the program.11, continue to write code, the frequency of the statistical summary, the code implementation as shown.12, the program run, get a TXT and Excel file, inside is about the word frequency statistics information, as shown. The red part is the result of the program running, and there is no err
A cool feature of Python is that it's easy to implement the word cloud. The GitHub has open source code on this project: Https://github.com/amueller/word_cloud Note that the Wordcloud folder is deleted when you run the routine The function of the word cloud is partly based on NLP, partly based on the image, as an example of the above code in a github wordclo
The word cloud is also called the text cloud, is to appear the high frequency in the text data "The key word" in the visual prominent present, forms the keyword the rendering to resemble the cloud similar color picture, thus one eye can appreciate the text data the main expression meaning.
It is complicated to generate your own
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.