Python crawl QQ space to say and generate word cloud

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The following is the generated word cloud

My environment: mac,anaconda,python2.7, and a variety of Python libraries to use first Anaconda

Anaconda is a Python release that can be used for scientific computing, supporting Linux, MAC, and Windows systems, with built-in scientific computing packages in common use. It solves two big pain points in official Python. First: Provides the package management function, the Windows platform installs the third party package frequently failed scene to solve, second: provides the environment management the function, the function resembles Virtualenv, solves the multiple version Python coexistence, the switching question.

Conda is a tool for package management and environmental management under Anaconda, functionally similar to the combination of Pip and vitualenv. Conda is added to the environment variable by default after successful installation, so you can run commands directly in the Command line window Conda

Conda's environmental management and VIRTUALENV are basically similar operations.

# View Help Conda-h 
# Create an environment with a name of PYTHON36 based on the python3.6 version Conda create--name python36 python=3.6 
# Activate this environment source activate PYTHON36 # linux/mac# again to check Python version, show is 3.6python-v # Exit current environment Source deactivate python36 
# Delete the environment Conda Remove-n Python3 6--all# View so the installed environment Conda INFO-E

Conda's package management function can be the same as the PIP, of course, you choose Pip to install the package is no problem.

# install matplotlib Conda install matplotlib# view installed packages Conda List 
# package Update Conda update matplotlib# Delete package Conda remove Matplotlib

Anything is a package in Conda. Conda itself can be considered a package, the Python environment can be considered a package, Anaconda can also be considered as a package, so the 3 packages are supported in addition to the normal Third-party package support updates. For example: Anaconda Mirror Address By default in foreign countries, with Conda installation package will be very slow, the current available domestic mirror source address has Tsinghua University. Modify ~/.condarc (LINUX/MAC)

Channels:
 -https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
 -Defaults
show_channel_urls: True

If the use of Conda installation package is still very slow, then you can consider using the PIP to install, the same as the PIP source address also changed to domestic, watercress source faster. Modify ~/.pip/pip.conf (LINUX/MAC)

[global]trusted-host = Pypi.douban.comindex-url = Http://pypi.douban.com/simple

After the environment is set up, you can start to play the data analysis happily. Crawl dynamic content because the content of dynamic pages is dynamically loaded, so we need to continue to slide, load the page to switch to the current contents of the frame, but also may not be frame, where you need to see the specific situation to get the page source data, and then into the XPath, and then read

# Drop-down scroll bar, so that the browser load dynamically loaded content,
 # I am here from 1 start to 6 end of 5 load per page data for I in
 Range (1,6):
 height = 20000*i# each time            Slide 20000 pixel
 Strword = "Window.scrollby (0," +str (height) + ")"
 Driver.execute_script (Strword)
 Time.sleep (4) # Many times the Web page is made up of multiple <frame> or <iframe>, the Webdriver default is the outermost frame,
 # So you need to select a Under the frame, or can not find the following required page elements
 driver.switch_to.frame ("App_canvas_frame")
 selector = etree. HTML (driver.page_source)
 divs = Selector.xpath ('//*[@id = Msglist ']/li/div[3] ')

Generate Word Cloud

Generate the word cloud need to use library: Wordcloud, generate word cloud matplotlib, generate word cloud picture Jieba, display Chinese.

#coding: Utf-8from wordcloud import wordcloudimport matplotlib.pyplot as Pltimport jieba# generate word Cloud def create_word_cloud ( FileName):    text= Open ("{}.txt". Format (filename)). Read ()    # stuttering participle    wordlist = jieba.c UT (text, cut_all=true)   &NBSP;WL = "". Join (wordlist)    # set Word cloud   &NBSP;WC = Wordcloud (  &nbs P    # Set background color       background_color= "white",         # Set maximum number of words to display in   &NB Sp   max_words=2000         # This font is in computer fonts, general path       font_path= '/system/library/fon TS/PINGFANG.TTC ',       height= 1200,       width= 1600,        # setting fonts 
 Max       max_font_size=100,     # Set How many randomly generated states, that is, how many color schemes       random_state=30,    )    myword = wc.generate (WL)  # generative word cloud    # display word Cloud chart    plt.imshow (mywor d)    plt.axis("Off")    plt.show ()    wc.to_file (' py_book.png ')  # keep the word cloud under if __name__ = ' __main__ ':    create_word_cloud (' Qq_word ')

All the complete code has been put GitHub

GitHub Address Https://github.com/Jimmy9876/QZone_spider

Http://www.aibbt.com/a/22275.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python crawl QQ space to say and generate word cloud

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python crawl QQ space to say and generate word cloud

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support