Python crawling path of a salted fish (3): crawling web images,
After learning the Requests library and Beautifulsoup library, we are going to practice a wave today to crawl web images. As you have learned, you can only crawl images on html pages, but not images generated by JavaScript.
So I found this website http://www.ivsky.com.
There are a lot of gallery in the website, so we will find your name to crawl it.
Http://www.ivsky.com/bizhi/yourname_v39947/check the source code of this page:
We can see that the image information we want to capture is in <ul> and there is more than one ul In the webpage. So we need to capture the next class together and then the image address is in img. So we can use the BeautifulSoup library here. method to parse the webpage and capture the image information.
soup =BeautifulSoup(html,'html.parser')
all_img= (soup.find('ul', class_='pli').find_all('img'))
for img in all_img:
src=img['src']
We use the requests library to obtain the url:
Def getHtmlurl (url): # obtain the url try: r = requests. get (url) r. raise_for_status () r. encoding = r. apparent_encoding return r. text before T: return ""
We want to download the image and save it locally:
Try: # create or determine whether the path image exists and download if not OS. path. exists (root): OS. mkdir (root) if not OS. path. exists (path): r = requests. get (img_url) with open (path, 'wb ') as f: f. write (r. content) f. close () print ("file saved successfully") else: print ("file already exists") failed T: print ("failed to crawl ")
The entire crawler framework and ideas:
Import requestsfrom bs4 import BeautifulSoupimport osdef getHtmlurl (url): # Get the url passdef getpic (html): # Get the image address and download passdef main (): main Function pass
Complete code is provided here
Import requestsfrom bs4 import BeautifulSoupimport osdef getHtmlurl (url): # obtain the url try: r = requests. get (url) r. raise_for_status () r. encoding = r. apparent_encoding return r. text into T: return "" def getpic (html): # Get the image address and download soup = BeautifulSoup (html, 'html. parser ') all_img = soup. find ('ul ', class _ = 'pli') find_all ('img') for img in all_img: src = img ['src'] img_url = src print (img_url) root = 'd:/pic/'path = root + img_url.split ('/') [-1] try: # create or determine whether the path image exists and download if not OS. path. exists (root): OS. mkdir (root) if not OS. path. exists (path): r = requests. get (img_url) with open (path, 'wb ') as f: f. write (r. content) f. close () print ("file saved successfully") else: print ("file already exists") failed T: print ("failed to crawl") def main (): url = 'HTTP: // www.ivsky.com/bizhi/yourname_v39947/'html = (getHtmlurl (url) print (getpic (html) main ()
Run the Code:
We can see that the images are saved locally. This is a simple case column. You can try it on your own.