Production __python of Python generated word cloud

Source: Internet
Author: User

This article from the blog Vander, csdn blog, if you need to reprint please indicate the source, respect for original thank you Blog address: http://blog.csdn.net/l540675759/article/details/61236376 What is a word cloud?

The word cloud is also called the text cloud, is to appear the high frequency in the text data "The key word" in the visual prominent appearance, the formation keyword rendering forms resembles the cloud similar color picture, thus one eye can appreciate the text data the main expression meaning.

There are also a lot of ready-made word cloud making tools:
1.Wordle is a game tool for generating word cloud images from text.
2.Tagxedo can make personalized word cloud online.
3.Tagul is a Web service that can also create gorgeous word clouds.
4.Tagcrowd can also enter a Web URL that directly generates a word cloud for a Web page.

The essence of the word cloud is the word cloud statistics in the text, according to the frequency of the number of proportional display size. the principle of producing word cloud

1. Segmentation of text data is also the first step in many NLP text processing, and the Process_text () method in Wordcloud is mainly the processing of stop words.

2. Calculates the frequency of each word appearing in the text, generating a hash table. Frequency calculation is equivalent to a variety of distributed computing platform of the first case wordcount, and a variety of languages, Hello World program has the same status, hehe.

3. According to the frequency of the number of proportional generation of a picture layout, class Integraloccupancymap is the word cloud algorithm, is the word cloud data visualization method of the core.

4. The word according to the corresponding frequency in the word cloud layout diagram to generate the picture, the core method is generate_from_frequencies, whether generate () or generate_from_text () all finally to Generate_from_ Frequencies

5. Complete the coloring of words on the cloud, the default is the 6C principle behind the cloud of random coloring words

* Connect: The goal is to select data from a variety of data sources, which will provide APIs, input formats, data acquisition rates, and provider constraints.
* Correct: Focus on data transfer for further processing while maintaining data quality and consistency
* Collect: Where data is stored, in what format, to facilitate later stages of assembly and consumption
* Compose: Focus on how to mix the data sets that have been collected, and enrich the information to build a data-driven product that has been introduced into the WINS.
* Consume: Focus on the use of data, rendering and how to make the right data at the right time to achieve the right results.
* Control: This is the sixth additional step that is required as data, organization, and participant growth, and it ensures that the data is controlled.

These 10 lines of code constructs the word cloud, does not pass the API from the public number (wireless_com) directly obtains, the simplification and the abstraction is the engineering typical way, here so far copied pastes, even omitted the correct process, directly stores the data in the plain text file, Through Jieba word processing that is compose, using the word cloud to generate visual images for consumption of consume, the cloud of their own generation to organize a different file directory for easy retrieval is a preliminary control of control bar.

On the library of participle
https://github.com/fxsjy/jieba
on the https://github.com/amueller/word_cloud of the word cloud

First install Wordcloud and Jieba

Pip install Wordcloud
pip install Jieba
Python Core code

import  Matplotlib.pyplot as PLT from
wordcloud import wordcloud
import Jieba

text_ From_file_with_apath =open ("/users/vander/desktop/dada", encoding= "UTF-8"). Read ()

Wordlist_after_jieba = Jieba.cut (text_from_file_with_apath,cut_all=true)

wl_space_split = "". Join (Wordlist_after_jieba)

My_ Wordcloud =wordcloud (font_path= "/LIBRARY/FONTS/SONGTI.TTC"). Generate (Wl_space_split)

plt.imshow (my_ Wordcloud)
Plt.axis ("Off")
Plt.show ()
Parsing:
1-3 lines were imported into the drawing library, word Cloud generation Library and Jieba.

4 rows are read from a local file.

5-6 lines using Jieba for participle, and the result of participle separated by a space.

7 lines to the word after the text generated words cloud.

8-10 lines with Pyplot to display the word cloud.
Thanks

This article for Cao Teacher in csdn python classes to share the content of the reorganization.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.