Love reading, love watching movies, and learning Python3. Just climbed a bit. On the Code
1 Importurllib.request2 fromBs4ImportBeautifulSoup3 defget_html (URL):4web=urllib.request.urlopen (URL)5Soup=beautifulsoup (web,"Html.parser")6Data=soup.find ("Div", id="wrapper")7 returnData8 defGet_all (data):9Data=data.find_all ("Table")Ten forLinkinchData: OneName=link.find ("Div", class_="PL2"). Find ("a"). Get_text (). Replace (' ',"'). Replace ('\ n',"') AAuthor=link.find ("P", class_="PL"). Get_text (). Split ('/') [0].replace (' ',"') -Score=link.find ("span", class_="rating_nums"). Get_text (). Replace (' ',"') -Peoplenum=link.find ("span", class_="PL"). Get_text (). Replace (' ',"'). Replace ('(',"'). Replace (')',"'). Replace ('\ n',"') the Try: -Remark=link.find ("P", class_="Quote"). Get_text (). Replace (' ',"'). Replace ('\ n',"') - except: -remark='No reviews' +With open ('F://book.txt','A +', encoding='UTF-8') as F: -F.write (name+' '+author+' '+score+' '+peoplenum+' '+remark+'\ r \ n') + if __name__=='__main__': AUrl='https://book.douban.com/top250?start=' atWith open ('F://book.txt','A +', encoding='UTF-8') as F: -F.write ('Book name'+'author'+'Ratings'+'Number of reviews'+'Reviews'+'\ r \ n') - forIinchRange (10): -Url1=url+str (i*25) -Get_all (get_html (URL1))
There's a book.
Here's the movie.
1 Importio2 ImportSYS3 Importurllib.request4 fromBs4ImportBeautifulSoup5 #get Web page6 defget_html (URL):7headers={'user-agent':'mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/60.0.3112.101 safari/537.36'}8req = Urllib.request.Request (url=url,headers=headers)9res =Urllib.request.urlopen (req)TenHtml=Res.read () OneSoup=beautifulsoup (HTML,'Html.parser') Adata = Soup.find ("ol"). Find_all ("Li") - returnData - defGet_all (data): the forInfoinchData: -Names = Info.find ("span") -Name =Names.get_text () -Scores = Info.find_all ("span",{"class":"Rating_num"}) +Score =Scores[0].get_text () -Nums=info.find ("Div", class_="Star"). Find_next (). Find_next (). Find_next (). Find_next (). Get_text () +With open ('F://movie.txt','A +', encoding='UTF-8') as F: AF.write (name+' '+score+' '+nums+'\ r \ n') at if __name__=='__main__': -Url='https://movie.douban.com/top250?start=' -With open ('F://movie.txt','A +', encoding='UTF-8') as F: -F.write ('Movie Name'+' '+'Ratings'+' '+'Number of reviews'+'\ r \ n') - forIinchRange (10): -Url1=url+str (i*25) +'&filter=' inGet_all (get_html (URL1))
Python3 Crawl Watercress