Python crawl QQ space to say and generate word cloud

Source: Internet
Author: User
Tags set background xpath virtualenv install matplotlib

The following is the generated word cloud

My environment: mac,anaconda,python2.7, and a variety of Python libraries to use first Anaconda

Anaconda is a Python release that can be used for scientific computing, supporting Linux, MAC, and Windows systems, with built-in scientific computing packages in common use. It solves two big pain points in official Python. First: Provides the package management function, the Windows platform installs the third party package frequently failed scene to solve, second: provides the environment management the function, the function resembles Virtualenv, solves the multiple version Python coexistence, the switching question.

Conda is a tool for package management and environmental management under Anaconda, functionally similar to the combination of Pip and vitualenv. Conda is added to the environment variable by default after successful installation, so you can run commands directly in the Command line window Conda

Conda's environmental management and VIRTUALENV are basically similar operations.

# View Help Conda-h 
# Create an environment with a name of PYTHON36 based on the python3.6 version Conda create--name python36 python=3.6 
# Activate this environment source activate PYTHON36 # linux/mac# again to check Python version, show is 3.6python-v # Exit current environment Source deactivate python36 
# Delete the environment Conda Remove-n Python3 6--all# View so the installed environment Conda INFO-E

Conda's package management function can be the same as the PIP, of course, you choose Pip to install the package is no problem.

# install matplotlib Conda install matplotlib# view installed packages Conda List 
# package Update Conda update matplotlib# Delete package Conda remove Matplotlib

Anything is a package in Conda. Conda itself can be considered a package, the Python environment can be considered a package, Anaconda can also be considered as a package, so the 3 packages are supported in addition to the normal Third-party package support updates. For example: Anaconda Mirror Address By default in foreign countries, with Conda installation package will be very slow, the current available domestic mirror source address has Tsinghua University. Modify ~/.condarc (LINUX/MAC)

Channels:
 -https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
 -Defaults
show_channel_urls: True

If the use of Conda installation package is still very slow, then you can consider using the PIP to install, the same as the PIP source address also changed to domestic, watercress source faster. Modify ~/.pip/pip.conf (LINUX/MAC)

[global]trusted-host = Pypi.douban.comindex-url = Http://pypi.douban.com/simple

After the environment is set up, you can start to play the data analysis happily. Crawl dynamic content because the content of dynamic pages is dynamically loaded, so we need to continue to slide, load the page to switch to the current contents of the frame, but also may not be frame, where you need to see the specific situation to get the page source data, and then into the XPath, and then read

# Drop-down scroll bar, so that the browser load dynamically loaded content,
 # I am here from 1 start to 6 end of 5 load per page data for I in
 Range (1,6):
 height = 20000*i# each time            Slide 20000 pixel
 Strword = "Window.scrollby (0," +str (height) + ")"
 Driver.execute_script (Strword)
 Time.sleep (4) # Many times the Web page is made up of multiple <frame> or <iframe>, the Webdriver default is the outermost frame,
 # So you need to select a Under the frame, or can not find the following required page elements
 driver.switch_to.frame ("App_canvas_frame")
 selector = etree. HTML (driver.page_source)
 divs = Selector.xpath ('//*[@id = Msglist ']/li/div[3] ')
Generate Word Cloud

Generate the word cloud need to use library: Wordcloud, generate word cloud matplotlib, generate word cloud picture Jieba, display Chinese.

#coding: Utf-8from wordcloud import wordcloudimport matplotlib.pyplot as Pltimport jieba# generate word Cloud def create_word_cloud ( FileName):    text= Open ("{}.txt". Format (filename)). Read ()    # stuttering participle    wordlist = jieba.c UT (text, cut_all=true)   &NBSP;WL = "". Join (wordlist)    # set Word cloud   &NBSP;WC = Wordcloud (  &nbs P    # Set background color       background_color= "white",         # Set maximum number of words to display in   &NB Sp   max_words=2000         # This font is in computer fonts, general path       font_path= '/system/library/fon TS/PINGFANG.TTC ',       height= 1200,       width= 1600,        # setting fonts 
 Max       max_font_size=100,     # Set How many randomly generated states, that is, how many color schemes       random_state=30,    )    myword = wc.generate (WL)  # generative word cloud    # display word Cloud chart    plt.imshow (mywor d)    plt.axis("Off")    plt.show ()    wc.to_file (' py_book.png ')  # keep the word cloud under if __name__ = ' __main__ ':    create_word_cloud (' Qq_word ')

All the complete code has been put GitHub

GitHub Address Https://github.com/Jimmy9876/QZone_spider

Http://www.aibbt.com/a/22275.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.