I usually like reading, so I made a catalogue of books, and recorded the list of books I read:
This is a XSLX file.
The following code, query each of the above books, and download the book cover. What needs to be stated are:
1. Query the platform of the book is a watercress reading
2. The Chinese name of the book is embedded directly in the request link, because it is a browser-specific encoding problem in Chinese, so it uses the Urllib quote
Effect as shown:
#!/usr/bin/env python #-*-coding:utf-8-*-#--author:xiangguosun #--2016.12.10 from urllib.request import Urlopen from Urllib.error Import httperror from urllib.request import Urlretrieve to urllib.parse Import quote from BS4 import Beaut
Ifulsoup Import re def getTitle (URL): Try:html=urlopen (URL) except Httperror as E:return None Try:bsobj=beautifulsoup (Html.read (), "lxml") Title=bsobj.findall ("img", {"src": Re.compile (R) https://img 3\.doubanio\.com/lpic/.* ")}) except Attributeerror as E:return None return title def get_book_picture (bo Okname): raw_bookname=[] Raw_bookname.append (bookname) Seed_url = u "https://m.douban.com/search/?query=" b Ook=quote (bookname) url=seed_url+book+u "&type=book" Print (URL) titlelist= getTitle (URL) img_url = Ti tlelist[0]["src"] urlretrieve (Img_url, ' e:/books/' + '%s '%raw_bookname[0]+ '. jpg ') print (raw_bookname[0], "Save done! ") Import pandas as PD
Data= pd.read_excel ("./books.xlsx") Import time for bookname in data[' BookName ']: print ("Start to search book:", Bo Okname) try:get_book_picture (bookname) time.sleep (5) Except:with open ("./photos.txt", ' a ' ) as F:f.write (bookname+ ' \ n ') time.sleep (5)