Python crawler crawl QQ say and generate word cloud, memories full!

Source: Internet
Author: User
Tags virtual environment

Python (pronunciation: English [? Pa?θ?n], beauty [? Pa?θɑ:n]), is an object-oriented, literal translation of computer programming language, but also a powerful general-purpose language, has nearly 20 years of development history, mature and stable. It contains a comprehensive set of standard libraries that are easy to understand and can easily accomplish many common tasks. Its syntax is very simple and clear, unlike most other programming languages, it uses indentation to define the statement.

Python supports imperative programming, object-oriented programming, functional programming, slice-oriented programming, generic programming, and many programming paradigms. Like scheme, Ruby, Perl, Tcl and other dynamic languages, Python has garbage collection and can automatically manage storage usage. It is often used as a scripting language to handle system administration tasks and network programming, but it is also well suited to perform high-level tasks. The Python virtual machine itself can run on almost all operating systems. Using tools such as Py2exe, PyPy, Pyinstaller, you can convert Python source code into a program that can run out of the Python interpreter.

Self-taught Python for a period of time, using Django to make a website, but also with Requests+beautifulsoup

Crawler over some simple website, the weekend study learned a wave, ready to crawl QQ Space said, and the content exists in TXT, read generated cloud.

Long time not to QQ, space is said to be more than a few years do not play, the inside is full of memories of school time, looking at the smile, smiled and laughed on ... Ha ha ~ ~

Without a picture of the void

I was still in the prime of the year, humorous and funny ...

This time, the use of

Selenium

Analog Login +

BeautifulSoup4

Crawl Data +

Wordcloud

Create a word cloud

BeautifulSoup Installation

Pip Install Beautifulsoup4

Here are the official documents of BEAUTIFULSOUP4.

Also need to use the parser, I choose to be

Html5lib

Parser

Pip Install Html5lib

The following table lists the main parsers and their pros and cons:

Selenium Analog Login

Use Selenium to login QQ space, install

Pip Install Selenium

I'm using a Chrom browser,

Webdriver. Chrome ()

To get the driver for your Chrome browser.

You also need to download the driver to install the corresponding browser, or you will be prompted when you run the script

Chromedriver executable needs to BES in PATH

Error, using a Mac, online search for a download-driven article

Similarly, the same window, download the corresponding driver, unzip, the downloaded **.exe to the Python installation directory, such as D:\python. You also need to add the Python installation directory to the system environment variable.

The Python learning route is divided into three main stages: basic-advanced-framework-project combat

Basic first Stage: understanding of basic python. Basic second stage face to object programming (emphasis on programming ability)

The third stage of the basic object-oriented "design idea"-encapsulation-inheritance. Basic Phase IV Python advanced topic.

The first stage of Advanced class: Linux Foundation. Second: Python Web tools. The third Python deployment tool.

The four relational database. Fifth Python web Framework Foundation principle.

Framework phase. Python Web development phase web.py. Base Second Django Foundation.

The Third Flask Foundation. The foundation of the four tornado,

Project Combat: Personal Blog System-development-Enterprise OA system = Network Disk System.

QQ Login page http://i.qq.com, using Webdriver to open the QQ space login page

Driver = Webdriver. Chrome () driver.get ("http://i.qq.com")

After opening, right click to check the page elements, find the account password login in

Login_frame

, first locate the frame,

Driver.switch_to.frame ("Login_frame")

, and then automatically click the account password Login button, automatically enter the account password login, and open to say the page, detailed code as follows,

This time can see has opened the QQ said page, note that some of the space will appear after a prompt box, you need to simulate the Click event to close this prompt box

TM I used to have a yellow diamond, so scary ~ ~, the space head is so young, mainstream ...

At the same time, because the content is dynamically loaded, you need to pull the scroll bar automatically, load all the content, and then simulate clicking on the next page loading content. See below for specific code.

BeautifulSoup crawl to talk about

F12 view content can be found to say in Feed_wrap this

, Inside the

In the tag array, the specific content of each word

Class= "BD"

The label.

At this point QQ said has climbed down, and saved in the Qq_word file

Next, create a word cloud

Word Cloud

Use Wordcloud package to generate word cloud, pip install Wordcloud

Here can also use Jieba participle, I did not use, because I think QQ said sentence read only a little feeling, personal preferences, with Jieba participle can be seen to say high-frequency times of some words.

Set some properties of the next Wordcloud, note that here to set the Font_path property, otherwise the characters will appear garbled.

Here's another reminder that if you're using a virtual environment, don't run the following script in a virtual environment, or you might get an error Runtimeerror:python is not installed as a framework. The Mac OS X backend'll isn't able to function correctly if Python was not installed as a framework. See the Python documentation for more information on installing Python as a framework on MAC OS X. Either reinstall Python as a framework, or try one of the other backends. If you is using (Ana) Conda Please install Python.app and replace the use of ' Python ' with ' pythonw '. See ' Working with Matplotlib on OSX ' in the Matplotlib FAQs for more information. , I was in this situation, deactivate out of the virtual environment and then run

At this point, crawl QQ said content, and generate word cloud.

What can python do?

Web development and crawler are more suitable for the 0 foundation

Automated operation and maintenance development and automated testing are suitable for those who are already doing operations and testing

Big Data data analysis This aspect is very need professional professional of relatively strong

Scientific calculations are generally used by researchers

Machine learning and AI first degree requirements high next high number requirements high difficulty is very big

I have a public number, and I often share some of the stuff about Python technology. If you like my share, you can use the search "Python language learning" to follow

Welcome to join thousands of people to exchange questions and answers skirt: 588+090+942

Python crawler crawl QQ say and generate word cloud, memories full!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.