Create a "heart" with Python based on Weibo data

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The annual dog abuse Day just past, friends circle all kinds of sun, sun, Sun, sun Food, Su-eun love. What the programmer is drying, the programmer is working overtime. But the gift is still indispensable, what good to send? As a programmer, I prepared a special gift, with the previous tweet data to create a "love", I think she must be moved to cry. Ha ha

Preparatory work

After the idea began to act, the nature of the first thought is to use Python, the general idea is to crawl the micro-blog data down, the data after cleaning processing and then the word processing, the processing of the data to the words cloud tools, with scientific computing tools and drawing tools to produce images, involving the toolkit has:

Requests used for network requests crawl micro-bo data, stuttering participle of Chinese word processing, word cloud processing library wordcloud, image processing Library Pillow, scientific computing tools NumPy, similar to MATLAB 2D drawing library Matplotlib

Tool installation

When these toolkits are installed, different system platforms may have the same error, and Wordcloud,requests,jieba can be installed online via the normal Pip method.

pip install wordcloudpip install requestspip install jieba

Installing Pillow,numpy,matplotlib on the Windows platform directly with the PIP online installation can cause a variety of problems, and one recommended way is to download a third-party platform called Python Extension Packages for Windows 1 Installation of the. WHL file. You can choose to download and install cp27 corresponding PYTHON2.7,AMD64 corresponding to the 64-bit system according to your system environment. Install after downloading to local

pip install Pillow-4.0.0-cp27-cp27m-win_amd64.whlpip install scipy-0.18.0-cp27-cp27m-win_amd64.whlpip install numpy-1.11.3+mkl-cp27-cp27m-win_amd64.whlpip install matplotlib-1.5.3-cp27-cp27m-win_amd64.whl

Other platforms can be resolved by Google on error. or directly based on the Anaconda development, it is a branch of Python, built in a lot of scientific computing, machine learning modules.

Get Data

Sina Weibo official API is a slag, only to obtain the latest release of the user 5 data, back to the second, the use of crawler to crawl data, before crawling to assess the difficulty, to see if someone wrote well, in GitHub stroll around, basically did not meet the demand. It gave me some ideas, so I decided to write my own crawler. Use the http://m.weibo.cn/mobile URL to crawl data. Discovery interface http://m.weibo.cn/index/my?format=cards&page=1 can be paged to get micro-blog data, and the data returned is in JSON format, so much easier, but the interface needs to log on the cookie information, By logging in to your account, you can find the cookie information in your Chrome browser.

Implementation code:

 def  Fetch_weibo (): API =  "Http://m.weibo.cn/index/my?format=cards &page=%s "for i in range (1, Span class= "Hljs-number" >102): Response = requests.get (url=api% i, cookies=cookies) data = Response.json () [0] groups = Data.get (" Card_group ") or [] for group in groups:text = Group.get ( " Mblog "). Get (" text ") Text = Text.encode (" utf-8 ") Text = cleanring (text). Strip () yield text

To view the total number of pages in Weibo is 101, considering the one-time return of a list object too much memory, the function uses yield to return a generator, in addition to the text for data cleansing, such as the removal of punctuation, HTML tags, "forward microblogging" such as the word.

Save data

After data acquisition, we want to save it offline for the next reuse and avoid repeated crawls. Use the CSV format to save to the Weibo.csv file for next use. Data saved to the CSV file may be garbled when opened, it doesn't matter, with notepad++ view is not garbled.

DefWrite_csv (texts): with codecs.open ( Weibo.csv ',  ' W ') as f:writer = csv. Dictwriter (F, Fieldnames=[ "text"]) Writer.writeheader () for Text in texts:writer.writerow ({ "text": text}) def read_csv (): Span class= "Hljs-keyword" >with codecs.open ( ' weibo.csv ',  ' R ') Span class= "Hljs-keyword" >as f:reader = csv. Dictreader (f) for row in Reader:  Yield Row[ ' text ']

word processing

Each microblog read from the Weibo.csv file is processed and handed to Wordcloud to generate the word cloud. Stuttering participle is suitable for most Chinese usage scenarios, using Stop thesaurus Stopwords.txt to filter out useless information (for example:, then, because, etc.).

def word_segment(texts):    jieba.analyse.set_stop_words("stopwords.txt") for text in texts: tags = jieba.analyse.extract_tags(text, topK=20) yield " ".join(tags)

Create a picture

After the data word processing, can give wordcloud processing, Wordcloud according to the data inside the frequency of the words appear, weight by the column display keyword font size. Create a square image,

Yes, the resulting picture is no beauty, after all, is to give people to take out the hand to show off right, then we find an art-rich image as a template, copy out a beautiful picture. I found a "heart" pattern on the Internet:

Generate Picture code:

def generate_img(texts):    data = " ".join(text for text in texts) mask_img = imread(‘./heart-mask.jpg‘, flatten=True) wordcloud = WordCloud( font_path=‘msyh.ttc‘, background_color=‘white‘, mask=mask_img ).generate(data) plt.imshow(wordcloud) plt.axis(‘off‘) plt.savefig(‘./heart.jpg‘, dpi=600)

Note that when processing, you need to specify a Chinese font for matplotlib, otherwise it will be garbled, find the font folder: C:\Windows\Fonts\Microsoft Yahei UI Copy the font, copy to the matplotlib installation directory: \ C Python27\lib\site-packages\matplotlib\mpl-data\fonts\ttf under

That's almost it.

When I was proud to send this picture to her, there was this conversation:

What is it?
Me: Love, do it with my own hands.
So professional, so moved ah, your eyes only Python, without me (laughter)
Me: There's python in the heart.

I seem to have said something wrong, hahaha.

The full code can be downloaded in the public number reply "H".

This article starts with the public number "a programmer's micro-station" (id:vttalk), sharing Python dry and temperature-based content
Blog Address: https://foofish.net/python-heart.html

Create a "heart" with Python based on Weibo data

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More