Python crawler loops into MySQL database

Source: Internet
Author: User
Tags mysql insert

1. Development environment

operating system: Win10 python version: Python 3.5.2 mysql:5.5.53

2, the use of the module

If not, use Pip to install: Pip install xxx xxx need to install the module

3, Analysis link (blog official website: https://www.cnblogs.com/)

Here we briefly analyze the home section

After analyzing the page system link variable is the last number, so you can write the link to the following mode, so that when the execution of a loop will be able to access all the page content to access

4. Analyze the content of the page

The information we need for the entire page is a blog post from bloggers, such as:

Accurate is the need to extract the title of the blog, introduction, release time and blog links

Find this page press F12 to review the element

Mouse Click this arrow, and then put on the page content, find the element we are looking for, in the following code section will appear the corresponding HTML:

right mouse button, select Copy element, you can copy this piece of information to the text, find a text document to save the following part of the code:

This content contains all the information of a blog, followed by the regular to extract the content we need to

5. Regular expressions

title= re.compile (' <a class= ' titlelnk.*?> (. *?) </a> ', Re. S

title1= Re.findall (title,html)

HTML is the entire Web page all the code document, these two lines of code will be in this page all the blog title in the Title1 list

where <a class= "titlelnk.*?> (. *?) </a> is a label that matches to all Class Titlelnk, (. *?) It's what we extracted.

6. Link Database

db = Pymysql.connect ("127.0.0.1", "root", "root", "crawler", charset= "UTF8") #打开数据链接,

Pymysql.connect () The first four parameters I will not say more, charset= "UTF8" This parameter can be saved just to ensure that the encoding is correct, or some circumstances can not insert data

cursor cursor = db. Cursor() # Create a Cursor object using the cursor () method

7. mysql INSERT statement

8. Collating the Code

Principle, code are in this, want to extract the content, analysis site can, of course, not all sites can crawl, special site has anti-crawling measures, need to learn more knowledge (Access frequency control, proxy IP pool, etc.)

Python crawler loops into MySQL database

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.