ImportRequests#PIP3 Install requestsImportReImportHashlibImport Time fromConcurrent.futuresImportThreadpoolexecutorpool=threadpoolexecutor (50) Movie_path=r'C:\mp4'defget_page (URL):Try: Response=requests.get (URL)ifResponse.status_code = = 200: returnResponse.textexceptException:PassdefParse_index (index_page): Index_page=index_page.result () URLs=re.findall ('class= "items". *?href= "(. *?)"', Index_page,re. S) forDetail_urlinchURLs:if notDetail_url.startswith ('http'): Detail_url='http://www.xiaohuar.com'+detail_url Pool.submit (get_page,detail_url). Add_done_callback (Parse_detail)defParse_detail (detail_page): Detail_page=Detail_page.result () L=re.findall ('id= "Media". *?src= "(. *?)"', Detail_page,re. S)ifL:movie_url=L[0]ifMovie_url.endswith ('mp4'): Pool.submit (Get_movie,movie_url)defget_movie (URL):Try: Response=requests.get (URL)ifResponse.status_code = = 200: M=hashlib.md5 () m.update (str (time.time ()). Encode ('Utf-8')) M.update (Url.encode ('Utf-8')) FilePath='%s\%s.mp4'%(Movie_path,m.hexdigest ()) with open (filepath,'WB') as F:f.write (response.content)Print('%s Download succeeded'%URL)exceptException:PassdefMain (): Base_url='http://www.xiaohuar.com/list-3-{page_num}.html' forIinchRange (5): URL=base_url.format (page_num=i) pool.submit (Get_page,url). Add_done_callback (Parse_index)if __name__=='__main__': Main ()
Using thread pooling for crawlers