Target task: Use multi-process to download each version of Jin Yong Net (old, revised, new revision) of the novel
The code is as follows:
#-*-coding:utf-8-*-ImportRequests fromlxmlImportetree fromMultiprocessingImportPoolImportOSImportsysreload (SYS) sys.setdefaultencoding ('Utf-8') Headers= {'user-agent':'mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:47.0) gecko/20100101 firefox/47.0'}defDownload (title,url, filename): Response= Requests.get (URL, headers=headers). Text HTML=etree. HTML (response) pages= Html.xpath ('//div//p/text ()') [2:] with open (filename,'a') as F:f.write (title+'\ n') forPageinchPages:with open (filename,'a') as F:f.write (page+'\ n')defMain (URL): Start_url='http://www.jinyongwang.com'+URL sname= Start_url.split ('/') [-2] ifSname.startswith ('o'): Folder='old/' if( notos.path.exists (folder): Os.makedirs (folder)elifSname.startswith ('N'): Folder='new/' if( notos.path.exists (folder): Os.makedirs (folder)Else: Folder='now/' if( notos.path.exists (folder): Os.makedirs (folder) filename= folder+sname+'. txt'Base_url='http://www.jinyongwang.com'Response= Requests.get (Start_url, headers=headers). Text HTML=etree. HTML (response) URLs= Html.xpath ('//ul[@class = "Mlist"]/li/a/@href') Titles= Html.xpath ('//ul[@class = "Mlist"]/li//text ()') forIndex,urlinchEnumerate (URLs): Full_url= base_url+URL title=Titles[index] Download (title, full_url, filename)if __name__=='__main__': Url01='http://www.jinyongwang.com/'Response= Requests.get (url01, headers=headers). Text HTML=etree. HTML (response) URLs= Html.xpath ('//li[@class = "Book_li"]/p[3]//a/@href') Pool=Pool () pool.map (Main,urls) pool.close () Pool.join ()
Results show:
Python crawler Instance (vi) multi-process Download Jin Yong net novel