Python small white can also crawl micro-blog data in minutes, and generate a personalized word cloud, you get to it?

Source: Internet
Author: User
Tags garbage collection what header

Python (pronunciation: English [? Pa?θ?n], beauty [? Pa?θɑ:n]), is an object-oriented, literal translation of computer programming language, but also a powerful general-purpose language, has nearly 20 years of development history, mature and stable. It contains a comprehensive set of standard libraries that are easy to understand and can easily accomplish many common tasks. Its syntax is very simple and clear, unlike most other programming languages, it uses indentation to define the statement.

Python supports imperative programming, object-oriented programming, functional programming, slice-oriented programming, generic programming, and many programming paradigms. Like scheme, Ruby, Perl, Tcl and other dynamic languages, Python has garbage collection and can automatically manage storage usage. It is often used as a scripting language to handle system administration tasks and network programming, but it is also well suited to perform high-level tasks. The Python virtual machine itself can run on almost all operating systems. Using tools such as Py2exe, PyPy, Pyinstaller, you can convert Python source code into a program that can run out of the Python interpreter.

Earlier wrote an article How to use micro-blog data to make the word cloud image, before the incomplete, and can only use their own data, now re-organized, any micro-blog data can be produced, put in today should be compared to the occasion.



This article teaches you how to use Python to quickly create a cloud of mind words, even python small white can be divided into minutes.

Preparatory work

This environment is based on Python3, theoretically Python2.7 is also feasible, first install the necessary third-party dependency package:



The Requirement.txt file contains several dependent packages above, and if the PIP installation fails, it is recommended to use Anaconda installation

Pip Install-r requirement.txt

First step: Analyze URLs

Open the Weibo mobile URL, find the goddess's Weibo ID, go to her Weibo homepage, and analyze the process of sending the request to the browser.



Open the Chrome browser debugging function, select the Network menu, observe the interface to get the microblogging data is, followed by a series of parameters, which some parameters are based on user changes, some are fixed, first extracted.



Again to analyze the return results of the interface, the return data is a JSON dictionary structure, total is the number of Weibo, each specific microblogging content encapsulated in the cards array, the specific content field is the text field inside. A lot of interference information has been hidden away.



Step two: Build the request header and query parameters

After analyzing the Web page, we began to use the requests simulation browser construction Crawler to obtain data, because here to get the user's data without logging on to Weibo, so we do not need to construct cookie information, only the basic request header can be, specifically need to what header information can be obtained from the browser, You first construct the required request parameters, including the request headers and query parameters.





UID is the ID of the Weibo user,

· Containerid is not meant to be, but is also relevant to a specific user parameters

· Page Paging parameters



The Python learning route is divided into three main stages: basic-advanced-framework-project combat

Basic first Stage: understanding of basic python. Basic second stage face to object programming (emphasis on programming ability)

The third stage of the basic object-oriented "design idea"-encapsulation-inheritance. Basic Phase IV Python advanced topic.

The first stage of Advanced class: Linux Foundation. Second: Python Web tools. The third Python deployment tool.

The four relational database. Fifth Python web Framework Foundation principle.

Framework phase. Python Web development phase web.py. Base Second Django Foundation.

The Third Flask Foundation. The foundation of the four tornado,

Project Combat: Personal Blog System-development-Enterprise OA system = Network Disk System.

Step three: Construct a simple crawler

Through the returned data can be queried to total Weibo, crawl data directly using the method provided by requests to convert the JSON data into a Python Dictionary object, extract all the values of the text field and put it in the blogs list, extract the text before the simple filter, remove the useless letter Interest. By the way, the data is written to the file, allowing the next conversion to no longer repeat crawls.





Fourth step: participle processing and building word cloud

Crawler all the data, the first participle, here is the stuttering participle, in accordance with the Chinese context of the sentence word processing, the word filter out the word in the process, after processing to find a reference map, and then according to the reference map through the words assembled into a diagram.



Eventually:





Who is more suitable for learning python?

1. Programming rookie Novice: very fond of programming, later want to engage in related work, but 0 basis, do not know what programming language to choose a friend, in fact, is the most suitable choice of Python programming language.

2. Website front-end developers: usually only focus on div+css These page technology, many times actually need to interact with the backend developers;

3.SEO personnel: Many SEO optimization, suffer from not programming, some programs above the problem, not to solve, can only do simple page optimization. Now after learning Python, you and I can write some query ingest, ranking, automatic generation of network map of the program to solve the tricky SEO problems.

4. Students in school: want to have skills, or is a self-taught programming enthusiasts, want to quickly get started, less detours, can choose the Python language.

I have a public number, and I often share some of the stuff about Python technology, and if you like my share, you can use the search for "Python language learning".


Welcome to join thousands of people to communicate questions and Answers group: 588+090+942


Python small white can also crawl micro-blog data in minutes, and generate a personalized word cloud, you get to it?

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.