Python crawls hundreds of times the first page of joke, python first page joke
Self-taught python web crawler, found that the request is better than urllib, so the request and BeautifulSoup are used to capture hundreds of homepage jokes.
BeautifulSoup uses find, findAll, and regular expressions to capture HTML modules. Of course, select is also a good choice.
The following is the temporary code, which will be improved later..
1 # coding = UTF-8 2 import requests 3 from bs4 import BeautifulSoup 4 5 page = 1 6 url = 'HTTP: // www.qiushibaike.com/hot/page/' + str (page) 7 try: 8 res = requests. get (url) 9 # print res. text # if the request is successful, the downloaded webpage is saved as a string in the corresponding text variable, which is why res is used. text. 10 failed t Exception as e: 11 print 'an Exception occurred while opening the webpage:', e12 13 try: 14 soup = BeautifulSoup (res. text, 'html. parser ') 15 elms = soup. select ('. content ') # A list is generated here. 16 for elm in elms: 17 print elm. text18 failed t Exception as e: 19 print 'parsing Exception: ', e