Create a "heart" with Python based on Weibo data

Source: Internet
Author: User
Tags image processing library install matplotlib

The annual dog abuse Day just past, friends circle all kinds of sun, sun, Sun, sun Food, Su-eun love. What the programmer is drying, the programmer is working overtime. But the gift is still indispensable, what good to send? As a programmer, I prepared a special gift, with the previous tweet data to create a "love", I think she must be moved to cry. Ha ha

Preparatory work

After the idea began to act, the nature of the first thought is to use Python, the general idea is to crawl the micro-blog data down, the data after cleaning processing and then the word processing, the processing of the data to the words cloud tools, with scientific computing tools and drawing tools to produce images, involving the toolkit has:

Requests used for network requests crawl micro-bo data, stuttering participle of Chinese word processing, word cloud processing library wordcloud, image processing Library Pillow, scientific computing tools NumPy, similar to MATLAB 2D drawing library Matplotlib

Tool installation

When these toolkits are installed, different system platforms may have the same error, and Wordcloud,requests,jieba can be installed online via the normal Pip method.

pip install wordcloudpip install requestspip install jieba

Installing Pillow,numpy,matplotlib on the Windows platform directly with the PIP online installation can cause a variety of problems, and one recommended way is to download a third-party platform called Python Extension Packages for Windows 1 Installation of the. WHL file. You can choose to download and install cp27 corresponding PYTHON2.7,AMD64 corresponding to the 64-bit system according to your system environment. Install after downloading to local

pip install Pillow-4.0.0-cp27-cp27m-win_amd64.whlpip install scipy-0.18.0-cp27-cp27m-win_amd64.whlpip install numpy-1.11.3+mkl-cp27-cp27m-win_amd64.whlpip install matplotlib-1.5.3-cp27-cp27m-win_amd64.whl

Other platforms can be resolved by Google on error. or directly based on the Anaconda development, it is a branch of Python, built in a lot of scientific computing, machine learning modules.

Get Data

Sina Weibo official API is a slag, only to obtain the latest release of the user 5 data, back to the second, the use of crawler to crawl data, before crawling to assess the difficulty, to see if someone wrote well, in GitHub stroll around, basically did not meet the demand. It gave me some ideas, so I decided to write my own crawler. Use the http://m.weibo.cn/mobile URL to crawl data. Discovery interface http://m.weibo.cn/index/my?format=cards&page=1 can be paged to get micro-blog data, and the data returned is in JSON format, so much easier, but the interface needs to log on the cookie information, By logging in to your account, you can find the cookie information in your Chrome browser.

Implementation code:

 def  Fetch_weibo (): API =  "Http://m.weibo.cn/index/my?format=cards &page=%s "for i in range (1, Span class= "Hljs-number" >102): Response = requests.get (url=api% i, cookies=cookies) data = Response.json () [0] groups = Data.get (" Card_group ") or [] for group in groups:text = Group.get ( " Mblog "). Get (" text ") Text = Text.encode (" utf-8 ") Text = cleanring (text). Strip () yield text         

To view the total number of pages in Weibo is 101, considering the one-time return of a list object too much memory, the function uses yield to return a generator, in addition to the text for data cleansing, such as the removal of punctuation, HTML tags, "forward microblogging" such as the word.

Save data

After data acquisition, we want to save it offline for the next reuse and avoid repeated crawls. Use the CSV format to save to the Weibo.csv file for next use. Data saved to the CSV file may be garbled when opened, it doesn't matter, with notepad++ view is not garbled.

DefWrite_csv (texts): with codecs.open ( Weibo.csv ',  ' W ') as f:writer = csv. Dictwriter (F, Fieldnames=[ "text"]) Writer.writeheader () for Text in texts:writer.writerow ({ "text": text}) def read_csv (): Span class= "Hljs-keyword" >with codecs.open ( ' weibo.csv ',  ' R ') Span class= "Hljs-keyword" >as f:reader = csv. Dictreader (f) for row in Reader:  Yield Row[ ' text ']          
word processing

Each microblog read from the Weibo.csv file is processed and handed to Wordcloud to generate the word cloud. Stuttering participle is suitable for most Chinese usage scenarios, using Stop thesaurus Stopwords.txt to filter out useless information (for example:, then, because, etc.).

def word_segment(texts):    jieba.analyse.set_stop_words("stopwords.txt") for text in texts: tags = jieba.analyse.extract_tags(text, topK=20) yield " ".join(tags)
Create a picture

After the data word processing, can give wordcloud processing, Wordcloud according to the data inside the frequency of the words appear, weight by the column display keyword font size. Create a square image,

Yes, the resulting picture is no beauty, after all, is to give people to take out the hand to show off right, then we find an art-rich image as a template, copy out a beautiful picture. I found a "heart" pattern on the Internet:

Generate Picture code:

def generate_img(texts):    data = " ".join(text for text in texts) mask_img = imread(‘./heart-mask.jpg‘, flatten=True) wordcloud = WordCloud( font_path=‘msyh.ttc‘, background_color=‘white‘, mask=mask_img ).generate(data) plt.imshow(wordcloud) plt.axis(‘off‘) plt.savefig(‘./heart.jpg‘, dpi=600)

Note that when processing, you need to specify a Chinese font for matplotlib, otherwise it will be garbled, find the font folder: C:\Windows\Fonts\Microsoft Yahei UI Copy the font, copy to the matplotlib installation directory: \ C Python27\lib\site-packages\matplotlib\mpl-data\fonts\ttf under

That's almost it.

When I was proud to send this picture to her, there was this conversation:

What is it?
Me: Love, do it with my own hands.
So professional, so moved ah, your eyes only Python, without me (laughter)
Me: There's python in the heart.

I seem to have said something wrong, hahaha.

The full code can be downloaded in the public number reply "H".

This article starts with the public number "a programmer's micro-station" (id:vttalk), sharing Python dry and temperature-based content
Blog Address: https://foofish.net/python-heart.html

Create a "heart" with Python based on Weibo data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.