The first step: introduce the relevant library package:
#Coding:utf-8__author__='Administrator'ImportJieba#Word breaker PackageImportNumPy#NumPy Calculation PackageImportCodecs#codecs provides the open method to specify the language encoding of the opened file, which is automatically converted to internal Unicode at read timeImportPandasImportMatplotlib.pyplot as Plt%Matplotlib Inline fromWordcloudImportWordcloud#Word Cloud Pack
Part II: Import a good word for the journey to the TXT file:
File=codecs.open (U"journey to the. txt",'R','Utf-8') Content=File.read () file.close () jieba.load_userdict (U"red mansions participle. txt") Segment=[]segs=jieba.cut (content) forSeginchSegs:ifLen (SEG) >1 andseg!='\ r \ n': Segment.append (SEG)
The third part: Statistical segmentation results and remove the discontinued words:
Segmentdf=pandas. DataFrame ({'segment': Segment}) Segmentdf.head () Stopwords=pandas.read_csv ("Stopwords.txt", index_col=false,quoting=3,sep="\ t", names=['Stopword'])#quoting=3 all not quotedstopwords.head () segmentdf=segmentdf[~SegmentDF.segment.isin (stopwords.stopword)]wystopwords=pandas. Series (['of','its','or','also','Square','in','that','both','because','still','therefore','still','?','the','of the','the','a' ,'No',' is','Yes','?', ' .','Ah','put','Let','to','towards','is a','in the','more','again', 'more','than','very','Partial','Don't','Good','can be','will','just','but','son','and','also','All','I'm','his','come to','" "']) SEGMENTDF=segmentdf[~segmentdf.segment.isin (Wystopwords)]
Fourth: Statistical frequency of Word:
Segstat=segmentdf.groupby (by=['segment') ['segment'].agg ({" count ": numpy.size}) Segstat=segstat.reset_index (). Sort (columns=" Count ", ascending=False) Segstat
Fifth step: Display the word cloud
Wordcloud=wordcloud (font_path="simhei.ttf", background_color="Black ")
Wordcloud=wordcloud.fit_words (Segstat.head (+) Itertuples (Index=false))
Plt.imshow (Wordcloud)
Sixth step: Custom Word Cloud shapes
fromScipy.miscImportImreadImportMatplotlib.pyplot as Plt fromWordcloudImportwordcloud,imagecolorgeneratorbimg=imread ('3.jPG') Wordcloud=wordcloud (background_color=" White", mask=bimg,font_path='C:\Windows\Fonts\simhei.ttf') Wordcloud=wordcloud.fit_words (Segstat.head (39769). Itertuples (index=False)) Bimgcolors=imagecolorgenerator (bimg) Plt.axis ("off") Plt.imshow (Wordcloud.recolor (Color_func=bimgcolors)) Plt.show ()
Use Python to play the word cloud