Climb a xx website of things, calculate the pace of the speed of the increase in the end how much, the site link will not put ...
ImportRequests fromBs4ImportBeautifulSoup as SBImportlxmlImportTimeurl='http://www.xxxx.com/html/part/index27_'url_list=[]start=time.time () forIinchRange (2,47): Print('Get page'+str (i)) headers= {'user-agent':'mozilla/5.0 (X11; Linux x86_64) applewebkit/537.36 (khtml, like Gecko) chrome/60.0.3112.78 safari/537.36'} res= Requests.get ((url+str (i) +'. html'), headers) res.encoding='gb2312'Soup= SB (Res.text,'lxml') Div= SB (Res.text,'lxml'). Find ('Div', class_="Box List Channel") forLiinchDiv.find_all ('Li'): URLs= ('http://www.xxxx.com'+ Li.a.get ('href') ) url_list.append (URLs)Print(URLs)Print(url_list)Print(Time.time ()-start)
111.7 s When you are finished crawling.
To try the process:
ImportRequests fromBs4ImportBeautifulSoup as SBImportlxmlImport Time fromGeventImportMonkeyImportgeventmonkey.patch_all () URL='http://www.231ka.com/html/part/index27_'url_list= [] forIinchRange (2,47): url_list.append (URL+str (i) +'. html')defget (URL):Print('get data from:'+URL) Headers= { 'user-agent':'mozilla/5.0 (X11; Linux x86_64) applewebkit/537.36 (khtml, like Gecko) chrome/60.0.3112.78 safari/537.36'} res=requests.get (URL, headers) res.encoding='gb2312'Soup= SB (Res.text,'lxml') Div= SB (Res.text,'lxml'). Find ('Div', class_="Box List Channel") forLiinchDiv.find_all ('Li'): Ur= ('http://www.231ka.com'+ Li.a.get ('href')) Print(UR) Start=time.time () Task= [] forUrlinchurl_list:task.append (Gevent.spawn (Get,url)) Gevent.joinall (Task)Print(Time.time ()-start)
The result is: 55.6 s
That is, in the same single-threaded case, the use of the co-process can reduce the time to half, and only the use of Python third-party libraries implementation.
It's awesome.
How powerful is the python process?