Single Thread crawler:
Import reimport requestsimport Timeurl_eb = ' Http://www.amazon.com/gp/search/other/ref=sr_sa_p_4?me=A22XNR713HGDVG &rh=n%3a9063592011%2ck%3aprojector&bbn=9063592011&keywords=projector&pickertolist=brandtextbin &ie=utf8&qid=1461902521 ' Headers_eb = {' user-agent ': ' mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) applewebkit/537.36 (khtml, like Gecko) chrome/50.0.2661.86 safari/537.36 '}url_aml = ' https://www . amazon.com/gp/search/other/ref=sr_sa_p_4?me=a3uji9wwe6prp5&rh=i%3amerchant-items&pickertolist= brandtextbin&ie=utf8&qid=1461899728 ' Headers_aml ={' user-agent ': ' mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) applewebkit/537.36 (khtml, like Gecko) chrome/50.0.2661.86 safari/537.36 '}url_dl= ' https://www.am azon.com/gp/search/other/ref=sr_sa_p_4?me=as7zu4mn0fpoy&rh=i%3amerchant-items&pickertolist= brandtextbin&ie=utf8&qid=1461901862 ' headers_dl = {' user-agent ': ' mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) applewebkit/537.36 (Khtml, LiKe Gecko) chrome/50.0.2661.86 safari/537.36 '}name = {' A ': ' exclusivebulbs ', ' B ': ' Amazing lamps ', ' C ': ' Dynamic Lamps '}# Listing_count = Re.findall (' <span class= ' Narrowvalue ' > (. *?) </span ', data.text) # f = dict (map (lambda X,y:[x,y],store_name,listing_count)) # # for K,v in F.items (): # print (K,V) def Foo_one (url,headers,name): Print ('--------------------------begins to crawl {0}at{1}---------------------------'. Format ( Name,time.ctime ())) response = Requests.get (url,headers=headers) store_name = Re.findall (' <span class= ' refinement Link "> (. *?) </span><span class= "Narrowvalue" > (. *?) </span ', response.text) for I in Store_name:print (i) print ('--------------------------crawl over at{}------------ ----------------'. Format (Time.ctime ()) Time.sleep (1) if __name__ = = ' __main__ ': Foo_one (url_eb,headers_eb,name[' a ') ) Foo_one (url_aml,headers_aml,name[' B ']) Foo_one (url_dl,headers_dl,name[' C '))
output: 00:25:33 Start, End 00:26:02 Takes 29 seconds
--------------------------began to crawl Exclusivebulbsatsat Apr 00:25:33---------------------------(' A.shine ', ' (97) ' (' Ampacelectronics ', ' (1,644) ') (' Aurabeam ', ' (33,084) ') (' AWO ', ' (1,206) ') (' Battery1inc ', ' (694) ') (' comoze lamps ', ' (6,172) ' (' Compatible Lamp ', ' (317) ') (' corgi Lamps ', ' (2,124) ') (' Ctlamp ', ' (3,499) ') (' Dell ', ' (191) ') (' Diamond lamps ') , ' (966) ') (' Dynamic ', ' (4) ') (' Eiki ', ' (460) ') (' Epharos ', ' (2,592) ') (' Epson ', ' (1,456) ') (' ereplacement ', ' () ') (' ER Eplacements ', ' (814) ') (' Ewo ', ' (s) ') (' Eworldlamp ', ' (354) ') (' FI lamps ', ' (5,707) ') (' FL projector Lamp for Mitsubis Hi ', ' (1) ') (' for Epson ', ' (3) ') (' Generic ', ' (9,769) ') (' Good Lamp ', ' (819) ') (' Hcdz ', ' (2,746) ') (' Hitachi ', ' (935) ') (' IET lamps ', ' (2,144) ') (' InFocus ', ' () ') (' JVC ', ' (326) ') (' KCL ', ' (3,781) ') (' Lampedia ', ' (618) ') (' Lutema ', ' (1,956) ') (' Mitsubishi ', ' (1,006) ') (' Mogobe ', ' (1,335) ') (' Myprojectorlamps ', ' (473) ') (' NEC ', ' (446) ') (' NEC Computers ', ' (13) ' (' Optoma ', ' (956) ') (' Osram sylvAnia ', ' (+) ') (' Panasonic ', ' (820) ') (' Philips ', ' (7,502) ') (' Powerwarehouse ', ' (9,971) ') (' Projector Lamps World ', ' (11 2 ') (' Pureglare ', ' (369) ') (' Samsung ', ' (1,078) ') (' sharp ', ' (426) ') (' Shopforbattery ', ' (2,510) ') (' SMART BOARD ', ' (66) ' (' Sony ', ' (990) ') (' tvlampsforless ', ' (+) ') (' Unknown ', ' (722) ')--------------------------crawl over Atsat APR 30 00:25:57 ------------------------------------------------------began to crawl amazing Lampsatsat APR 00:25:58-------------- -------------(' AWO ', ' (1) ') (' comoze lamps ', ' (2) ') (' Dngo ', ' (8) ') (' Electrified ', ' (9) ') (' Electrified ', ' (Ten) ') (' Elec Trified discounters ', ' (5) ') (' electrified lamps ', ' (1,177) ') (' Electrified printhead ', ' () ') (' electrified printheads ', ' (2) ') (' FI lamps ', ' (2) ') (' Generic ', ' (+) ') (' Glowatt ', ' (1) ') (' KCL ', ' (1) ') (' OEM ', ' (1) ') (' Powerwarehouse ', ' (7 ') (' SKU ', ' (5) ') (' Top Lamp ', ' (1) ') (' Unknown ', ' (1) ') (' Usom ', ' (3) ')--------------------------crawl over Atsat APR 30 00:26: ------------------------------------------------------began to crawl the dynamic Lampsatsat Apr 00:26:01---------------------------(' Battery1inc ', ' (85) ' (' BenQ ', ' (237) ') (' Buslink ', ' (+) ') (' Calumet ', ' (2) ') (' comoze lamps ', ' (405) ') (' Ctlamp ', ' (615) ') (' Dell ', ' (82) ') ( ' Divine Lighting ', ' ((+) ') (' Dngo ', ' (+) ') (' Dynamic ', ' (4) ') (' Eiko ', ' () ') ' (' Electrified ', ' (2) ') ' electrified LAMP S ', ' (+) ') (' Electronix Xpress ', ' (418) ') (' Epharos ', ' (502) ') (' Epson ', ' (631) ') (' ereplacements ', ' (119) ') (' FI lamps '), ' (505) ') (' FL projector Lamp for Mitsubishi ', ' (1) ') (' G-lamps ', ' (+) ') (' ge ', ' (248) ') (' ge Lighting ', ' () ') (' Gener Al Electric ', ' (+) ') (' Generic ', ' (1,671) ') (' Genie ', ' (101) ') (' Glamps ', ' (2) ') (' Impact ', ' (7) ') (' Industrial Lighting Solutions ', ' (9) ') (' KCL ', ' (280) ') (' Kodak ', ' (1) ') (' Lampedia ', ' (+) ') (' M-wave ', ' (830) ') (' Mitsubishi ', ' (406) ') (' Mi Tsubishi DLP TV bulbs ', ' (+) ') (' Mocpinc ', ' () ') (' Myprojectorlamps ', ' (344) ') (' Nec ', ' (+) ') (' Optoma ', ' (161) ') (' Os Ram ', ' (1,295) ') (' Panasonic '), ' (245) ') (' Philips ', ' (988) ') (' Powerwarehouse ', ' (239) ') (' Projector Lamps World ', ' () ') (' Pureglare ', ' (107) ') (' Sam Sung ', ' (323) ') (' Shopjimmy ', ' (3) ') (' Sony ', ' (141) ') (' Sylvania ', ' (+) ') (' Technical ', ' () ') ' "Precision '", ' (167) ' (' Welch Allyn Compatible ', ' (1) ')--------------------------crawl Atsat Apr 00:26:02--------------------- -------
Multithreading: 00:32:37 start 00:32:39 end time 2 seconds
Import reimport requestsimport threadingimport timefrom time Import ctime,sleepurl_eb = ' http://www.amazon.com/gp/ search/other/ref=sr_sa_p_4?me=a22xnr713hgdvg&rh=n%3a9063592011%2ck%3aprojector&bbn=9063592011& keywords=projector&pickertolist=brandtextbin&ie=utf8&qid=1461902521 ' Headers_EB = {' User-Agent ': ' mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) applewebkit/537.36 (khtml, like Gecko) chrome/50.0.2661.86 safari/537.36 '}url_aml = ' https://www . amazon.com/gp/search/other/ref=sr_sa_p_4?me=a3uji9wwe6prp5&rh=i%3amerchant-items&pickertolist= brandtextbin&ie=utf8&qid=1461899728 ' Headers_aml ={' user-agent ': ' mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) applewebkit/537.36 (khtml, like Gecko) chrome/50.0.2661.86 safari/537.36 '}url_dl= ' https://www.am azon.com/gp/search/other/ref=sr_sa_p_4?me=as7zu4mn0fpoy&rh=i%3amerchant-items&pickertolist= brandtextbin&ie=utf8&qid=1461901862 ' headers_dl = {' user-agent ': ' mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) applewebkit/537.36 (khtml, like Gecko) chrome/50.0.2661.86 safari/537.36 '}name = {' A ': ' exclusivebulbs ', ' B ': ' Amazing lamps ', ' C ': ' Dynamic lamps '}# listing_count = Re.findall (' <span class= ' Narrowvalue ' > (. *?) </span ', data.text) # f = dict (map (lambda X,y:[x,y],store_name,listing_count)) # # for K,v in F.items (): # print (K,V) def Foo_one (url,headers,name): Print ('--------------------------begins to crawl {0}at{1}---------------------------'. Format ( Name,time.ctime ())) response = Requests.get (url,headers=headers) store_name = Re.findall (' <span class= ' refinement Link "> (. *?) </span><span class= "Narrowvalue" > (. *?) </span ', response.text) for I in Store_name:print (i) print ('--------------------------crawl is complete {0}at{1}-------- --------------------'. Format (Name,time.ctime ())) threads = []T1 = Threading. Thread (target=foo_one,args= (url_eb,headers_eb,name[' a ')) threads.append (t1) t2 = Threading. Thread (target=foo_one,args= (url_aml,headers_aml,name[' B '])) Threads.append (t2) t3 = threading. Thread (target=foo_one,args= (url_dl,headers_dl,name[' C ')) Threads.append (T3) If __name__ = = ' __main__ ': for T in Threads:t.setdaemon (True) T.start () T.join () print ("All over%s"%ctime ())
output:
--------------------------began to crawl Exclusivebulbsatsat APR 00:32:37-------------------------------------------- ---------began to crawl to amazing Lampsatsat APR 00:32:37----------------------------------------------------- began to crawl the dynamic Lampsatsat Apr 00:32:37---------------------------(' A.shine ', ' () ') (' Ampacelectronics ', ' (1,645 ') (' Aurabeam ', ' (33,088) ') (' AWO ', ' (1,209) ') (' Battery1inc ', ' (694) ') (' comoze lamps ', ' (6,172) ') (' Compatible Lamp ', ' (317) ' (' Corgi Lamps ', ' (2,123) ') (' Ctlamp ', ' (3,501) ') (' Dell ', ' (191) ') (' Diamond lamps ', ' (966) ') (' Dynamic ', ' (4) ') ( ' Eiki ', ' (457) ') (' Epharos ', ' (2,592) ') (' Epson ', ' (1,456) ') (' ereplacement ', ' (+) ') (' ereplacements ', ' (813) ') (' Ewo ') S ', ' (+) ') (' Eworldlamp ', ' (354) ') (' FI lamps ', ' (5,710) ') (' FL projector Lamp for Mitsubishi ', ' (1) ') (' for Epson ', ' ( 3 ') (' Generic ', ' (9,771) ') (' Good Lamp ', ' (819) ') (' Hcdz ', ' (2,748) ') (' Hitachi ', ' (935) ') (' IET lamps ', ' (2,137) ') (' InFo Cus ', ' (+) ') (' JVC ', ' (326) ') (' KCL ', ' (3,783) ') (' LamPedia ', ' (618) ') (' Lutema ', ' (1,955) ') (' Mitsubishi ', ' (1,006) ') (' Mogobe ', ' (1,336) ') (' Myprojectorlamps ', ' (473) ') (' NEC ', ' (*) ') (' NEC Computers ', ' (+) ') (' Optoma ', ' (956) ') (' Osram Sylvania ', ' (+) ') (' Panasonic ', ' (820) ') (' Philips ', ' (7,502) ') (' Powerwarehouse ', ' (9,972) ') (' Projector Lamps World ', ' (the ") ') (' Pureglare ', ' (369) ') (' Samsung ', ' (1,078) ') (' Sharp ', ' (426) ') (' Shopforbattery ', ' (2,511) ') (' SMART BOARD ', ' (") ') (' Sony ', ' (990) ') (' tvlampsforless ', ' (14) ') (' Unknown ', ' (722) ')--------------------------climbed over Exclusivebulbsatsat Apr 00:32:38--------------------------- -(' Battery1inc ', ' () ') (' BenQ ', ' (237) ') (' Buslink ', ' () ') (' Calumet ', ' (2) ') (' comoze lamps ', ' (405) ') (' Ctlamp ', ' ( 615) ' (' Dell ', ' (+) ') (' Divine Lighting ', ' (+) ') (' Dngo ', ' (+) ') (' Dynamic ', ' (4) ') (' Eiko ', ' (+) ') (' electrified ', ' (2) ' (' electrified lamps ', ' () ') (' Electronix Xpress ', ' (418) ') (' Epharos ', ' (502) ') (' Epson ', ' (631) ') (' Ereplacement S ', ' (119) ') (' FI lamps ', ' (505) ')(' FL projector Lamp for Mitsubishi ', ' (1) ') (' G-lamps ', ' (43) ') (' GE ', ' (248) ') (' GE Lighting ', ' (152) ') (' General Electric ', ' (53) ') (' Generic ', ' (1,671) ') (' Genie ', ' (101) ') (' Glamps ', ' (2) ') (' Impact ', ' (7) ') (' Industrial Lighting Solutions ', ' (9) ') (' KCL ', ' (280) ') (' Kodak ', ' (1) ') (' Lampedia ', ' (63) ') (' M-wave ', ' (830) ') (' Mitsubishi ', ' (406) ') (' Mitsubishi DLP TV bulbs ', ' (29) ') (' Mocpinc ', ' (10) ') (' Myprojectorlamps ', ' (344) ') (' Nec ', ' (19) ') (' Optoma ', ' (161) ') (' Osram ', ' (1,295) ') (' Panasonic ', ' (245) ') (' Philips ', ' (988) ') (' Powerwarehouse ', ' (239) ') (' Projector Lamps World ', ' (45) ') (' Pureglare ', ' (107) ') (' Samsung ', ' (323) ') (' Shopjimmy ', ' (3) ') (' Sony ', ' (141) ') (' Sylvania ', ' (115) ') (' Technical Precision ', ' (10) ') (' Unknown ', ' (167) ') (' Welch Allyn Compatible ', ' (1) ')--------------------------climbed over. Dynamic Lampsatsat APR 00:32:39---------------- ------------all over Sat APR 30 00:32:39 2016
Efficiency comparison of single-threaded crawler vs multithreaded crawler