Preface
In the previous article, we explained the installation of Anaconda in the Ubuntu environment and made a simple English word cloud.
Some students may try to change the article into Chinese, make the Chinese word cloud. I think we're going to get the results.
There are many differences between Chinese and English in coding, and when we do the English word cloud, in an article, the words are separated by a space,
but the Chinese language does not use spaces. All of them have the picture above. So how to Chinese participle? We need to use a tool, Jieba (stuttering)
preparatory work
1. Text data, as an object of analysis. This is a must, this time I chose the last relevant text data
I have a dream of the Chinese version. Make the Dream.txt file, and keep it in the same directory as the code.
2. Anaconda Tool Set, the last article has been about how to install and use, this is not long-winded.
3. Worldcloud, the Python Extension toolkit for the lyrics cloud.
4. Jieba expansion pack for Chinese word segmentation.
5. Simsum.tty Chinese font pack for display in Chinese.
First Step
Open the terminal and enter the following command to install the Jieba expansion pack
Pip install Jieba //installation is simple, there's nothing to say
Continue typing in terminal
Jupyter notebook //Open the Code Editor and switch to the directory where Dream.txt is stored
If you did the last one because of the word cloud, then you can use the last directory, and in the Code Editor, enter the following code
File = open (' Dream.txt ')
text = File.read ()
text
The text appears to indicate that there is no problem with the textual data and that it can be opened normally.
participle
Between the second and third lines, insert the following code to do the word breaker
Import Jieba//imports Jieba participle
text = '. Join (Jieba.cut (text))//Chinese participle
You will see the following picture to show that the participle was successful
Word Cloud Generation
Comment out the last text of the code to prevent interference. Continue typing in the editor
From Wordcloud import wordcloud
Wordcloud = Wordcloud (). Generate (text)
At this point, if there is no error, there is no output, it is not the word cloud has been analyzed completed.
Not, but this time is not the same as the last English, because we want to export the Chinese word cloud, so we
Prepare the Simsum.tty font pack, put it in the same directory as the code, and then enter the following code in the Code Editor:
From Wordcloud import wordcloud
Wordcloud = Wordcloud (font_path= "Simsun.ttf"). Generate (MyText)
There is still no output, but this is not far from success.
Word Cloud Output
Enter the following code in the Code Editor:
%pylab inline
import matplotlib.pyplot as Plt
plt.imshow (Wordcloud, interpolation= ' bilinear ')
Plt.axis ( "Off")
You will see the following results, please disregard the warning
A simple Chinese word cloud is done ...