Efficiency comparison of single-threaded crawler vs multithreaded crawler

Source: Internet
Author: User

Single Thread crawler:

Import reimport requestsimport Timeurl_eb = ' Http://www.amazon.com/gp/search/other/ref=sr_sa_p_4?me=A22XNR713HGDVG &rh=n%3a9063592011%2ck%3aprojector&bbn=9063592011&keywords=projector&pickertolist=brandtextbin &ie=utf8&qid=1461902521 ' Headers_eb = {' user-agent ': ' mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) applewebkit/537.36 (khtml, like Gecko) chrome/50.0.2661.86 safari/537.36 '}url_aml = ' https://www . amazon.com/gp/search/other/ref=sr_sa_p_4?me=a3uji9wwe6prp5&rh=i%3amerchant-items&pickertolist= brandtextbin&ie=utf8&qid=1461899728 ' Headers_aml ={' user-agent ': ' mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) applewebkit/537.36 (khtml, like Gecko) chrome/50.0.2661.86 safari/537.36 '}url_dl= ' https://www.am azon.com/gp/search/other/ref=sr_sa_p_4?me=as7zu4mn0fpoy&rh=i%3amerchant-items&pickertolist= brandtextbin&ie=utf8&qid=1461901862 ' headers_dl = {' user-agent ': ' mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) applewebkit/537.36 (Khtml, LiKe Gecko) chrome/50.0.2661.86 safari/537.36 '}name = {' A ': ' exclusivebulbs ', ' B ': ' Amazing lamps ', ' C ': ' Dynamic Lamps '}# Listing_count = Re.findall (' <span class= ' Narrowvalue ' > (. *?)  </span ', data.text) # f = dict (map (lambda X,y:[x,y],store_name,listing_count)) # # for K,v in F.items (): # print (K,V) def Foo_one (url,headers,name): Print ('--------------------------begins to crawl {0}at{1}---------------------------'. Format ( Name,time.ctime ())) response = Requests.get (url,headers=headers) store_name = Re.findall (' <span class= ' refinement Link "> (. *?) </span><span class= "Narrowvalue" > (. *?) </span ', response.text) for I in Store_name:print (i) print ('--------------------------crawl over at{}------------ ----------------'. Format (Time.ctime ()) Time.sleep (1) if __name__ = = ' __main__ ': Foo_one (url_eb,headers_eb,name[' a ') ) Foo_one (url_aml,headers_aml,name[' B ']) Foo_one (url_dl,headers_dl,name[' C '))

output: 00:25:33 Start, End 00:26:02 Takes 29 seconds

--------------------------began to crawl Exclusivebulbsatsat Apr 00:25:33---------------------------(' A.shine ', ' (97) ' (' Ampacelectronics ', ' (1,644) ') (' Aurabeam ', ' (33,084) ') (' AWO ', ' (1,206) ') (' Battery1inc ', ' (694) ') (' comoze lamps ', ' (6,172) ' (' Compatible Lamp ', ' (317) ') (' corgi Lamps ', ' (2,124) ') (' Ctlamp ', ' (3,499) ') (' Dell ', ' (191) ') (' Diamond lamps ') , ' (966) ') (' Dynamic ', ' (4) ') (' Eiki ', ' (460) ') (' Epharos ', ' (2,592) ') (' Epson ', ' (1,456) ') (' ereplacement ', ' () ') (' ER Eplacements ', ' (814) ') (' Ewo ', ' (s) ') (' Eworldlamp ', ' (354) ') (' FI lamps ', ' (5,707) ') (' FL projector Lamp for Mitsubis Hi ', ' (1) ') (' for Epson ', ' (3) ') (' Generic ', ' (9,769) ') (' Good Lamp ', ' (819) ') (' Hcdz ', ' (2,746) ') (' Hitachi ', ' (935) ') (' IET lamps ', ' (2,144) ') (' InFocus ', ' () ') (' JVC ', ' (326) ') (' KCL ', ' (3,781) ') (' Lampedia ', ' (618) ') (' Lutema ', ' (1,956) ') (' Mitsubishi ', ' (1,006) ') (' Mogobe ', ' (1,335) ') (' Myprojectorlamps ', ' (473) ') (' NEC ', ' (446) ') (' NEC Computers ', ' (13) ' (' Optoma ', ' (956) ') (' Osram sylvAnia ', ' (+) ') (' Panasonic ', ' (820) ') (' Philips ', ' (7,502) ') (' Powerwarehouse ', ' (9,971) ') (' Projector Lamps World ', ' (11 2 ') (' Pureglare ', ' (369) ') (' Samsung ', ' (1,078) ') (' sharp ', ' (426) ') (' Shopforbattery ', ' (2,510) ') (' SMART BOARD ', ' (66) ' (' Sony ', ' (990) ') (' tvlampsforless ', ' (+) ') (' Unknown ', ' (722) ')--------------------------crawl over Atsat APR 30 00:25:57 ------------------------------------------------------began to crawl amazing Lampsatsat APR 00:25:58-------------- -------------(' AWO ', ' (1) ') (' comoze lamps ', ' (2) ') (' Dngo ', ' (8) ') (' Electrified ', ' (9) ') (' Electrified ', ' (Ten) ') (' Elec Trified discounters ', ' (5) ') (' electrified lamps ', ' (1,177) ') (' Electrified printhead ', ' () ') (' electrified printheads ', ' (2) ') (' FI lamps ', ' (2) ') (' Generic ', ' (+) ') (' Glowatt ', ' (1) ') (' KCL ', ' (1) ') (' OEM ', ' (1) ') (' Powerwarehouse ', ' (7 ') (' SKU ', ' (5) ') (' Top Lamp ', ' (1) ') (' Unknown ', ' (1) ') (' Usom ', ' (3) ')--------------------------crawl over Atsat APR 30 00:26: ------------------------------------------------------began to crawl the dynamic Lampsatsat Apr 00:26:01---------------------------(' Battery1inc ', ' (85) ' (' BenQ ', ' (237) ') (' Buslink ', ' (+) ') (' Calumet ', ' (2) ') (' comoze lamps ', ' (405) ') (' Ctlamp ', ' (615) ') (' Dell ', ' (82) ') ( ' Divine Lighting ', ' ((+) ') (' Dngo ', ' (+) ') (' Dynamic ', ' (4) ') (' Eiko ', ' () ') ' (' Electrified ', ' (2) ') ' electrified LAMP  S ', ' (+) ') (' Electronix Xpress ', ' (418) ') (' Epharos ', ' (502) ') (' Epson ', ' (631) ') (' ereplacements ', ' (119) ') (' FI lamps '), ' (505) ') (' FL projector Lamp for Mitsubishi ', ' (1) ') (' G-lamps ', ' (+) ') (' ge ', ' (248) ') (' ge Lighting ', ' () ') (' Gener Al Electric ', ' (+) ') (' Generic ', ' (1,671) ') (' Genie ', ' (101) ') (' Glamps ', ' (2) ') (' Impact ', ' (7) ') (' Industrial Lighting Solutions ', ' (9) ') (' KCL ', ' (280) ') (' Kodak ', ' (1) ') (' Lampedia ', ' (+) ') (' M-wave ', ' (830) ') (' Mitsubishi ', ' (406) ') (' Mi Tsubishi DLP TV bulbs ', ' (+) ') (' Mocpinc ', ' () ') (' Myprojectorlamps ', ' (344) ') (' Nec ', ' (+) ') (' Optoma ', ' (161) ') (' Os Ram ', ' (1,295) ') (' Panasonic '), ' (245) ') (' Philips ', ' (988) ') (' Powerwarehouse ', ' (239) ') (' Projector Lamps World ', ' () ') (' Pureglare ', ' (107) ') (' Sam  Sung ', ' (323) ') (' Shopjimmy ', ' (3) ') (' Sony ', ' (141) ') (' Sylvania ', ' (+) ') (' Technical ', ' () ') ' "Precision '", ' (167) ' (' Welch Allyn Compatible ', ' (1) ')--------------------------crawl Atsat Apr 00:26:02--------------------- -------

Multithreading: 00:32:37 start 00:32:39 end time 2 seconds

Import reimport requestsimport threadingimport timefrom time Import ctime,sleepurl_eb = ' http://www.amazon.com/gp/ search/other/ref=sr_sa_p_4?me=a22xnr713hgdvg&rh=n%3a9063592011%2ck%3aprojector&bbn=9063592011& keywords=projector&pickertolist=brandtextbin&ie=utf8&qid=1461902521 ' Headers_EB = {' User-Agent ': ' mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) applewebkit/537.36 (khtml, like Gecko) chrome/50.0.2661.86 safari/537.36 '}url_aml = ' https://www . amazon.com/gp/search/other/ref=sr_sa_p_4?me=a3uji9wwe6prp5&rh=i%3amerchant-items&pickertolist= brandtextbin&ie=utf8&qid=1461899728 ' Headers_aml ={' user-agent ': ' mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) applewebkit/537.36 (khtml, like Gecko) chrome/50.0.2661.86 safari/537.36 '}url_dl= ' https://www.am azon.com/gp/search/other/ref=sr_sa_p_4?me=as7zu4mn0fpoy&rh=i%3amerchant-items&pickertolist= brandtextbin&ie=utf8&qid=1461901862 ' headers_dl = {' user-agent ': ' mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) applewebkit/537.36 (khtml, like Gecko) chrome/50.0.2661.86 safari/537.36 '}name = {' A ': ' exclusivebulbs ', ' B ': ' Amazing lamps ', ' C ': ' Dynamic lamps '}# listing_count = Re.findall (' <span class= ' Narrowvalue ' > (. *?)  </span ', data.text) # f = dict (map (lambda X,y:[x,y],store_name,listing_count)) # # for K,v in F.items (): # print (K,V) def Foo_one (url,headers,name): Print ('--------------------------begins to crawl {0}at{1}---------------------------'. Format ( Name,time.ctime ())) response = Requests.get (url,headers=headers) store_name = Re.findall (' <span class= ' refinement Link "> (. *?) </span><span class= "Narrowvalue" > (. *?) </span ', response.text) for I in Store_name:print (i) print ('--------------------------crawl is complete {0}at{1}-------- --------------------'. Format (Name,time.ctime ())) threads = []T1 = Threading. Thread (target=foo_one,args= (url_eb,headers_eb,name[' a ')) threads.append (t1) t2 = Threading. Thread (target=foo_one,args= (url_aml,headers_aml,name[' B '])) Threads.append (t2) t3 = threading. Thread (target=foo_one,args= (url_dl,headers_dl,name[' C ')) Threads.append (T3) If __name__ = = ' __main__ ': for T in Threads:t.setdaemon (True) T.start () T.join () print ("All over%s"%ctime ())

output:

--------------------------began to crawl Exclusivebulbsatsat APR 00:32:37-------------------------------------------- ---------began to crawl to amazing Lampsatsat APR 00:32:37----------------------------------------------------- began to crawl the dynamic Lampsatsat Apr 00:32:37---------------------------(' A.shine ', ' () ') (' Ampacelectronics ', ' (1,645  ') (' Aurabeam ', ' (33,088) ') (' AWO ', ' (1,209) ') (' Battery1inc ', ' (694) ') (' comoze lamps ', ' (6,172) ') (' Compatible Lamp ', ' (317) ' (' Corgi Lamps ', ' (2,123) ') (' Ctlamp ', ' (3,501) ') (' Dell ', ' (191) ') (' Diamond lamps ', ' (966) ') (' Dynamic ', ' (4) ') ( ' Eiki ', ' (457) ') (' Epharos ', ' (2,592) ') (' Epson ', ' (1,456) ') (' ereplacement ', ' (+) ') (' ereplacements ', ' (813) ') (' Ewo ') S ', ' (+) ') (' Eworldlamp ', ' (354) ') (' FI lamps ', ' (5,710) ') (' FL projector Lamp for Mitsubishi ', ' (1) ') (' for Epson ', ' ( 3 ') (' Generic ', ' (9,771) ') (' Good Lamp ', ' (819) ') (' Hcdz ', ' (2,748) ') (' Hitachi ', ' (935) ') (' IET lamps ', ' (2,137) ') (' InFo Cus ', ' (+) ') (' JVC ', ' (326) ') (' KCL ', ' (3,783) ') (' LamPedia ', ' (618) ') (' Lutema ', ' (1,955) ') (' Mitsubishi ', ' (1,006) ') (' Mogobe ', ' (1,336) ') (' Myprojectorlamps ', ' (473) ') ('  NEC ', ' (*) ') (' NEC Computers ', ' (+) ') (' Optoma ', ' (956) ') (' Osram Sylvania ', ' (+) ') (' Panasonic ', ' (820) ') (' Philips ', ' (7,502) ') (' Powerwarehouse ', ' (9,972) ') (' Projector Lamps World ', ' (the ") ') (' Pureglare ', ' (369) ') (' Samsung ', ' (1,078) ') (' Sharp ', ' (426) ') (' Shopforbattery ', ' (2,511) ') (' SMART BOARD ', ' (") ') (' Sony ', ' (990) ') (' tvlampsforless ', ' (14) ') (' Unknown ', ' (722) ')--------------------------climbed over Exclusivebulbsatsat Apr 00:32:38--------------------------- -(' Battery1inc ', ' () ') (' BenQ ', ' (237) ') (' Buslink ', ' () ') (' Calumet ', ' (2) ') (' comoze lamps ', ' (405) ') (' Ctlamp ', ' (  615) ' (' Dell ', ' (+) ') (' Divine Lighting ', ' (+) ') (' Dngo ', ' (+) ') (' Dynamic ', ' (4) ') (' Eiko ', ' (+) ') (' electrified ', ' (2) ' (' electrified lamps ', ' () ') (' Electronix Xpress ', ' (418) ') (' Epharos ', ' (502) ') (' Epson ', ' (631) ') (' Ereplacement S ', ' (119) ') (' FI lamps ', ' (505) ')(' FL projector Lamp for Mitsubishi ', ' (1) ') (' G-lamps ', ' (43) ') (' GE ', ' (248) ') (' GE Lighting ', ' (152) ') (' General Electric ', ' (53) ') (' Generic ', ' (1,671) ') (' Genie ', ' (101) ') (' Glamps ', ' (2) ') (' Impact ', ' (7) ') (' Industrial Lighting Solutions ', ' (9) ') (' KCL ', ' (280) ') (' Kodak ', ' (1) ') (' Lampedia ', ' (63) ') (' M-wave ', ' (830) ') (' Mitsubishi ', ' (406) ') (' Mitsubishi DLP TV bulbs ', ' (29) ') (' Mocpinc ', ' (10) ') (' Myprojectorlamps ', ' (344) ') (' Nec ', ' (19) ') (' Optoma ', ' (161) ') (' Osram ', ' (1,295) ') (' Panasonic ', ' (245) ') (' Philips ', ' (988) ') (' Powerwarehouse ', ' (239) ') (' Projector Lamps World ', ' (45) ') (' Pureglare ', ' (107) ') (' Samsung ', ' (323) ') (' Shopjimmy ', ' (3) ') (' Sony ', ' (141) ') (' Sylvania ', ' (115) ') (' Technical Precision ', ' (10) ') (' Unknown ', ' (167) ') (' Welch Allyn Compatible ', ' (1) ')--------------------------climbed over. Dynamic Lampsatsat APR 00:32:39---------------- ------------all over Sat APR 30 00:32:39 2016

  

Efficiency comparison of single-threaded crawler vs multithreaded crawler

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.