Two versions were written:
1. Process-oriented version:
ImportRequests fromPyqueryImportPyquery as Pqurl='https://movie.douban.com/top250'moves=[]defsec (item):returnItem[1] forIinchRange (0,255,25): Content=requests.get (url+"? start="+str (i))#? start=25 forMovieinchPQ (Content.text). Find ('. Item'): Moves.append ([PQ (movie). Find ('. Title'). HTML (), PQ (movie). Find ('. Rating_num'). HTML ()]) Moves.sort (Key=sec,reverse=True) forMoveinchmoves:Print(Move[0],move[1])
2. Object-oriented version:
ImportRequests fromPyqueryImportPyquery as PQclassDouban:def __init__(self): self.moves=[] defGeturl (self): URL='https://movie.douban.com/top250?start=%s'URLs=[] forIinchRange (0,250,25): urls.append (URL%i)returnURLsdefDownloader (Self,url): R=requests.get (URL)returnR.textdefHtml_parser (self,page): forMovieinchPQ (page). Find ('. Item'): Title=PQ (Movie). Find ('. Title'). html () score=PQ (Movie). Find ('. Rating_num'). HTML () self.moves.append ({'title': Title,'score': Score,}) defoutput (self): Self.moves.sort (Key=Lambdax:x['score'],reverse=True) forMoveinchself.moves:Print(move['title'],move['score']) defStart (self): forUrlinchSelf.geturl ():#print (URL)Page=self.downloader (URL) self.html_parser (page) self.output () Dou=Douban () Dou.start ( )
Python crawler crawls a watercress movie top 250 movies and ratings (requests+pyquery)