Writing a web crawler in Python (eight): The web crawler of the Encyclopedia (v0.2) Source and analysis

Source: Internet
Author: User
Tags regular expression in python

Project content:

A web crawler in the Encyclopedia of embarrassing things written in Python.

How to use:

Create a new bug.py file, and then copy the code into it, and then double-click to run it.

Program function:

Browse the embarrassing encyclopedia in the command prompt line.

Principle Explanation:

First, take a look at the home page of the embarrassing encyclopedia: HTTP://WWW.QIUSHIBAIKE.COM/HOT/PAGE/1

As can be seen, the link in the page/after the number is the corresponding page number, remember this for future preparation.

Then, right click to view the page source:

Observation found that each of the jokes with the center tag, which class will be Content,title is the post time, we only need to use regular expressions to "buckle" out on it.

After understanding the principle, the rest is the content of the regular expression, you can refer to this blog post:

http://blog.csdn.net/wxg694175346/article/details/8929576

Operation Effect:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.