How do you feel after you've seen it? Do you want to make a piece of it yourself?
If your answer is yes, we will not delay, today we step by step from scratch to make a word cloud analysis diagram. Of course, as the basis of the word cloud, certainly not compared to those two infographic cool. But it doesn't matter, the good start is half the success. The taste of the pulp, you can upgrade your skills in the back, into the success of your hanging on the road.
There are many tutorials on the web that teach you how to do infographic. Many of them use special tools. These tools are good, convenient and powerful. Only their functions are too exclusive, the scope of application is limited. What we are trying to do today is to use the Universal programming language Python to do the word cloud.
Python is a popular programming language nowadays. Not only can you use it for data analysis and visualization, but you can also use it to make websites, crawl data, do math problems, and write scripts for your lazy ...
Do you know the watercress? It was written in Python from the beginning.
In the current programming language heat sequencing, Python is ranked fourth (of course, many people disagree, so there are many programming language leaderboards, you know). But look at the problem to use the development vision. With the development of data science, Python has a tendency to explode. It's good to stand on the tuyere early.
If you don't have a programming foundation before, it's okay. From scratch, it means I'll teach you how to install the Python runtime environment and step through the word cloud. I hope you don't limit yourself to browsing, but try it yourself. By the finished step, you can not only make the first word cloud, but it will also be your first useful programming work.
You got a heartbeat? Let's get started.
Installation
First, we need to install the Python runtime environment.
If you're using MacOS, Python is actually installed in your system.
However, we want to use the functionality of many expansion packs. So it's a good idea to install a Python tool set. Only one installation is required, and most of the functionality is integrated in the future. Instead of having to use new features every time, you can install new packages in a piecemeal fashion.
There are many kinds of python suits, which are recommended for everyone anaconda. Because after more than 4 years of trial and contrast, I feel that this software package is more convenient to install, the expansion package coverage and structure more reasonable.
Please download the Anaconda package to this website. Drop-down page to find the download location. Choose the appropriate version based on your operating system type.
Because my system is MacOS, so the site directly recommended to me is the MacOS system version. But if you're using Windows or Linux, switch to the appropriate tab.
Regardless of which operating system you use, be aware of the two buttons on the right, corresponding to Python 2.X and 3.X versions respectively. Someone must be very puzzled, since there is a new version, I why use the old?
That's not true. Until 2020, two versions of Python will coexist. Python developers really want you to upgrade to the 3.X version. Unfortunately at present 3. The X version is compatible with fewer extensions than the 2.X version, especially when it comes to data science-related packages. So if you are a beginner, I suggest you download 2. X (currently 2.7) version, so in future use, you may encounter fewer problems. When you are skilled, then move to 3. The x version is not too late. Trust me, by then you will soon be adapting to the new version.
After the download, execute the installation file on the line.
Depending on how fast your computer is running, the installation time varies. Be patient, just wait this time.
Once the installation is complete, please install a "modern" browser. If you are using MacOS, then the system comes with Safari is very good. Other options include Firefox and Google Chrome.
Please install one of the above browsers and set it as the default browser for your system.
OK, now go to command line mode.
Under MacOS and Linux, you need to open a terminal (terminal).
If it is windows, open start-Accessories-command prompt.
Type the following command:
mkdir DEMOCD Demo
Well, you now have a dedicated directory called demo. Please go to MacOS Finder or "My Computer" in Windows, find this directory and open it.
Back to the terminal, MacOS or Linux users, type the following command:
Pip Install Wordcloud
MacOS will prompt you to install the Xcode command-line tool, and you can follow the default settings step-by-step. Please note, however, that it must be installed in a WiFi environment. If you're using 4G of traffic, you'll have to spend a fortune.
If you are using Windows, then in order to use the word cloud package, it is a little more trouble, you need to download WORDCLOUD?1.3.1?CP27?CP27M?WIN32.WHL this file here. Download it and drag it to your demo directory.
At the command line, first execute:
Pip Install Wheel
Then, execute:
Pip Install WORDCLOUD?1.3.1?CP27?CP27M?WIN32.WHL
All right, all the Python runtimes we need are finally loaded.
Be sure to follow the steps above to ensure that each step has been completed successfully. Otherwise, once omitted, the running program will error.
Data
The object of the word cloud analysis is text.
In theory, text can be in a variety of languages. English, Chinese, French, Arabic ...
For simplicity, here's an example of English text. You can go online to find an English article as an analysis object. I especially like the English drama "Yes, minister", so I found the introduction of the show on Wikipedia.
I copied the text part of the text and stored it as a textual file called Yes-minister.txt.
Move this file to our working directory demo.
Well, the text data is ready. Start entering the magical world of programming!
Code
At the command line, execute:
Jupyter Notebook
The browser opens automatically and displays the following screen.
This is the work we have just made-the installation of the operating environment. We have not written the program yet, and there is only one text file that has just been generated in the directory.
Open the file and browse through the contents.
Go back to the main page of the Jupyter notebook. We click the New button to create a new notebook (Notebook). In notebooks, select the Python 2 option.
We are prompted to enter the name of the notebook. The name of the program code file you can pick up. But I suggest you make a meaningful name, and you'll find it handy in the future. Because we want to try the word cloud, we call it wordcloud good.
Then there was a blank notebook for us to use. We enter the following 3 statements in the Unique Code text box on the page. Be sure to type verbatim according to the sample code, the number of spaces can not be different. Pay particular attention to the third line, starting with 4 spaces, or 1 tabs. After typing, press the Shift+enter key to execute.
filename = "Yes-minister.txt" with open (filename) as F:
There is no result.
Yes, because we don't have any output action here, the program just opens your yes-minister.txt text file, reads the contents of it, and stores it in a variable called MyText.
Then we try to display the contents of the MyText. After you enter the following statement, you still have to press the Shift+enter key before the system actually executes the statement.
MyText
In the next step, do not forget this confirmation execution action.
The results shown are as shown.
Well, it seems that the text stored in the MyText variable is the text we picked up from the Internet. So far, everything is fine.
Then we call (import) the word cloud package, using the text content stored in mytext to make the word cloud.
From Wordcloud import Wordcloud
Then the program may call the police. Don't worry. Warning (warning) does not affect the normal operation of the program.
The word cloud analysis is now complete. You're right, the core steps of making a word cloud require only these 2 lines of statements, and the first one just looks for foreign aid from the expansion pack. But the program doesn't show us anything.
What about the word cloud? Toss so long, but nothing, you chiseler?!
Take it easy. After entering the following 4 lines of statements, it is time to witness the miracle.
%pylab Inlineimport Matplotlib.pyplot as Plt
Operation Result:
Don't be so excited.
You can use the "Save Picture as" function to export the right mouse button on the word cloud image.
Through this word cloud, we can see the difference in the frequency of different words and phrases. High-frequency words are significantly larger, and the colors are very eye-catching. It is worth noting that the most conspicuous word hacker is not a hacker, but one of the protagonists of the play-Prime Minister Huck.
IPYNB file containing the full code of the program, I have also shared it out, you can download from here.
I hope you're doing well in the process. Are you satisfied with your generated word cloud? If you're not satisfied, it doesn't matter, you can tap other advanced features of the Wordcloud software package. Try to see if you can make such a word cloud. If you have any problems in your learning process or want to acquire learning resources, talk to me privately.
How to use Python to do word cloud (favorites)