Python crawls the B station Tens data and discovers the secrets of these hot up masters!

Source: Internet
Author: User
Tags sqlite database

Python (pronunciation: English [? Pa?θ?n], beauty [? Pa?θɑ:n]), is an object-oriented, literal translation of computer programming language, but also a powerful general-purpose language, has nearly 20 years of development history, mature and stable. It contains a comprehensive set of standard libraries that are easy to understand and can easily accomplish many common tasks. Its syntax is very simple and clear, unlike most other programming languages, it uses indentation to define the statement.

Python supports imperative programming, object-oriented programming, functional programming, slice-oriented programming, generic programming, and many programming paradigms. Like scheme, Ruby, Perl, Tcl and other dynamic languages, Python has garbage collection and can automatically manage storage usage. It is often used as a scripting language to handle system administration tasks and network programming, but it is also well suited to perform high-level tasks. The Python virtual machine itself can run on almost all operating systems. Using tools such as Py2exe, PyPy, Pyinstaller, you can convert Python source code into a program that can run out of the Python interpreter.

Fan monologue

Speaking of the popular B station believe that many like to play animation, see the most creative up master's classmates must be very familiar. I've been learning python for so long, why not use Python to crawl the people I care about in the B station, people they care about, and see what the top up owners are all over the station.

Points:

-Crawl 100,000 user data

-Data storage

-Data word Cloud analysis

1. Preparation phase

Before writing the code, think about it: since I want to crawl the user's attention, I need to store the relationship between the users and determine who is the primary user and who is follower.

The storage relationship uses the database most conveniently, also facilitates the later data analysis, I choose the SQLite database, because Python comes with the sqlite,sqlite in the Python to use is also very convenient.

The database requires 2 tables, one table stores the user's mutual concern information, the other table stores the user's basic information, and in the user system of station B, a user's mid number is unique.

Then I need a list to store so the users that have crawled, prevent repeated crawls, after all, the user's mutual concern between the phenomenon is also there, the list of the user's mid number can be.

2. Create a new database

First write the code of the database, put a user table in the database, a relational table:

3. Crawl the first 5 pages of user data

I need to find the JSON interface of the user's watchlist for the B station and find it soon, the address is:

https://api.bilibili.com/x/relation/followings?vmid=2&pn=1&ps=20&order=desc&jsonp=jsonp& Callback=__jp7

Where the vimd= parameter is the user's mid number

Pn=1 refers to the user's first side of the user's attention, one side showing 20 users

Because of the privacy settings of station B, a person can only crawl the top 5 pages of someone else's attention, a total of 100 people.

The whole crawl page idea is relatively simple, first set up the header, with the Requests Library API request, get the user data list of concern.

We crawl the first 5 pages, each page of data for a simple processing, and then into the dictionary data to get MID,UNAME,SIGN3 data, the last Save () function into DB.

4. Deposit into the database

We have a total of 2 tables in the dataset, a list of users to store so the user information, one is the user's attention information.

5. Quest is a popular up master

It is intended to generate the word cloud using data that has been crawled locally, and to see which of the 100,000 users have the most number of common concerns.

The idea of the code is mainly to get the user's name from the database, repeat the more times the more users concerned, then I use a picture of fate as the word cloud mask image, the final generation of word cloud pictures.

Last one, look at the word cloud.

Can see lace, violent walk comic, fish water Heart, penetrate the C June, Papi sauce, such as B station big up Main is a hot concern.

What can python do?

Web development and crawler are more suitable for the 0 foundation

Automated operation and maintenance development and automated testing are suitable for those who are already doing operations and testing

Big Data data analysis This aspect is very need professional professional of relatively strong

Scientific calculations are generally used by researchers

Machine learning and AI first degree requirements high next high number requirements high difficulty is very big

I have a public number, and I often share some of the stuff about Python technology. If you like my share, you can use the search "Python language learning" to follow

Welcome to join thousands of people to exchange questions and answers skirt: 699+749+852


Python crawls the B station Tens data and discovers the secrets of these hot up masters!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.