Self-Learning Python web crawler, found that request than urllib or to use some, so using request and beautifulsoup to achieve embarrassing home joke crawl.
BeautifulSoup uses find and findall as well as the use of regular expressions to implement the HTML corresponding module crawl, of course select is also a good choice.
The following is a temporary code that will continue to be perfected .
1 #Coding=utf-82 ImportRequests3 fromBs4ImportBeautifulSoup4 5page = 16URL ='http://www.qiushibaike.com/hot/page/'+Str (page)7 Try:8res=requests.get (URL)9 #Print Res.text # If the request succeeds, the downloaded Web page acts as a string and is stored in the corresponding text variable, which is why the res.text is used. Ten exceptException as E: One Print 'An exception occurred opening the Web page:', E A - Try: -Soup=beautifulsoup (Res.text,'Html.parser') theElms=soup.select ('. Content')#A list is generated here - forElminchElms: - PrintElm.text - exceptException as E: + Print 'parsing An exception occurred:'E
Python crawls the first page of jokes