Recently in the busy exam things, there is no time to knock code, one months also not a few days to see the code, recently saw the visual word cloud, see the Internet is also a lot of such tools,
But not perfect, some do not support Chinese, some Chinese word frequency statistics are inexplicable, some do not support custom shapes, all can not customize the color
So online search, decided to use Python to draw the word cloud, the main use is Wordcloud Library, installation only need Pip Isntall Wordcloud on the line,
The data is in the hotel reviews data, the code is as follows:
#-*-coding:utf-8-*-ImportMatplotlib.pyplot as PltImportPickle fromWordcloudImportWordcloud,stopwords,imagecolorgeneratorImportJieba#Import Codecs#fin = codecs.open (' HotelComments.txt ', mode = ' r ', encoding = ' utf-8 ')#print fin.read ()#The first time you run the program, put the words into the file.#text = ' '#With open (' HotelComments.txt ') as Fin:#For line in Fin.readlines ():#Line = Line.strip (' \ n ')#text + = '. Join (Jieba.cut (line))#text + = "#fout = open (' Text.txt ', ' WB ')#pickle.dump (text,fout)#fout.close ()#read data directly from a fileFR = Open ('Text.txt','RB') Text=Pickle.load (FR) Backgroud_image= Plt.imread ('girl.jpg') WC= Wordcloud (Background_color =' White',#Set Background colorMask = Backgroud_image,#set a background pictureMax_words = 2000,#set the maximum number of words for realityStopwords = Stopwords,#set the Stop wordFont_path ='C:/users/windows/fonts/msyh.ttf',#set font formatting, if not set to display ChineseMax_font_size = 50,#set the maximum font sizeRandom_state = 30,#How many randomly generated states are set, that is, how many color schemes are available) wc.generate (text) image_colors=imagecolorgenerator (backgroud_image) wc.recolor (Color_func=image_colors) plt.imshow (WC) Plt.axis ('off') plt.show ()
:
Custom Word Cloud shapes:
Reference blog:
http://blog.csdn.net/tanzuozhev/article/details/50789226
http://blog.csdn.net/qq_16912257/article/details/52458515
Use Python to draw a word cloud