A very simple reptile crawling around the middle of the hotel reviews information.
#-*-coding:utf-8-*-ImportRequestsImportReImport TimedefPlacesplider (name, Star, URL): Time.sleep (5) Res= Requests.get ('http://www.dianping.com'+URL) Text=Res.text Longinfo="<p class=\ "desc j-desc\" > (. *?) </p>"Longinfo_re=Re.compile (Longinfo, re. Dotall) Longinfos=Longinfo_re.findall (text) info="sml-rank-stars sml-str (. *?) \ ". *?<p class=\" Desc\ "> (. *?) </p>"Info_re=Re.compile (info, re.) Dotall) Results=info_re.findall (text)#Print Result #print '%d results '%len (results) ifLen (results) = = 0orLen (Results[0]) < 2orResults[0][1].count (U'people reviews') >0:PrintU'no reviews \ n' returnFOut= Open ('D:\\%s.txt'%name,'W') Fout.write ('Place star%s\n'%star) forResultinchResults:star=Result[0] Info= Result[1] ifInfo.count ('<span') > 0orInfo.count (U'For Sale Only') >0:#to advertise Print "' Break Else: ifInfo[-6:] = = u"......":#Replace short comment for corresponding long commentinfo = info[:-6] forIinchLonginfos:ifI.count (Info) >0:info=I BreakInfo= Info.replace ("<br/>","') Info= Info.replace ("<br>","') Info= Info.replace (" ","') PrintStar, Info Fout.write ('%s\n'%star) Fout.write ('%s\n'%info.encode ('U8') ) Fout.close () forPageinchRange (1, 6): Res= Requests.get ('http://www.dianping.com/search/keyword/206/0_%E4%B8%AD%E5%B1%B1%E5%A4%A7%E5%AD%A6/p'+Str (page)) text=res.text href="data-hippo-type=\ "Shop\" title=\ "(. *?) \ "Target=\" _blank\ "href=\" (. *?) \ ". *?sml-rank-stars sml-str (. *?) \""Href_re=re.compile (href, re. Dotall) Result=href_re.findall (text) forPlaceinchResult:name=place[0] URL= Place[1] Star= Place[2] Printname, star, url placesplider (name, Star, URL) time.sleep (5)
Simple Volkswagen Reviews Crawler