Project content:
A web crawler in the Encyclopedia of embarrassing things written in Python.
How to use:
Create a new bug.py file, and then copy the code into it, and then double-click to run it.
Program function:
Browse the embarrassing encyclopedia in the command prompt line.
Principle Explanation:
First, take a look at the home page of the embarrassing encyclopedia: HTTP://WWW.QIUSHIBAIKE.COM/HOT/PAGE/1
As can be seen, the link in the page/after the number is the corresponding page number, remember this for future preparation.
Then, right click to view the page source:
Observation found that each of the jokes with the center tag, which class will be Content,title is the post time, we only need to use regular expressions to "buckle" out on it.
After understanding the principle, the rest is the content of the regular expression, you can refer to this blog post:
http://blog.csdn.net/wxg694175346/article/details/8929576
Operation Effect: