The principle and the previous chapter to get the same, just a change of the content of the analysis.
Code:
#-*-coding:utf-8-*-import urllib2import redef getpagecontent (page_url,heads): Try:req = Urllib2. Request (page_url,headers=heads) resp = Urllib2.urlopen (req) return Resp.read (). Decode (' UTF8 ') except Exce Ption, E:print "Request [%s] error. "% (Page_url), E return" "Def gettopnotes (cont): Strre = '. *?<li>.*?data-user-slug=" (. *?) " Strre + = '. *?Output:The secret 4820=== of C:\Python27\python.exe f:/srccode/python/getnewlyjokes/jianshuspider.py4c4231dc6796/p/0aabe4120b78 Sewer ================================564d899d4d3c/p/8af1ad733670 Cicada summer I want to meet you 11771=================================== A36e18ccb59d/p/f9e60eb98a28 goodbye, loved ones 846===================================bcfca792018f/p/9fa6b6e58fd0 we met, Thought on the sad (35) 1927===================================2870cb3c6f77/p/8329df311356 Best lover 39288========================= ==========DC22650A4033/P/F7F39B72FDB2 "Serial" Not Touch Goddess (10) 3121===================================
Content once: Author ID, article link, article title, Number of comments, number of likes received
A popular article on Python crawl simple book