Python crawling path of a salted fish (3): crawling web images,

Source: Internet
Author: User

Python crawling path of a salted fish (3): crawling web images,

After learning the Requests library and Beautifulsoup library, we are going to practice a wave today to crawl web images. As you have learned, you can only crawl images on html pages, but not images generated by JavaScript.
So I found this website http://www.ivsky.com.

 

There are a lot of gallery in the website, so we will find your name to crawl it.

 

Http://www.ivsky.com/bizhi/yourname_v39947/check the source code of this page:

We can see that the image information we want to capture is in <ul> and there is more than one ul In the webpage. So we need to capture the next class together and then the image address is in img. So we can use the BeautifulSoup library here. method to parse the webpage and capture the image information.

soup =BeautifulSoup(html,'html.parser')
all_img= (soup.find('ul', class_='pli').find_all('img'))
for img in all_img:
src=img['src']


We use the requests library to obtain the url:

Def getHtmlurl (url): # obtain the url try: r = requests. get (url) r. raise_for_status () r. encoding = r. apparent_encoding return r. text before T: return ""

We want to download the image and save it locally:

Try: # create or determine whether the path image exists and download if not OS. path. exists (root): OS. mkdir (root) if not OS. path. exists (path): r = requests. get (img_url) with open (path, 'wb ') as f: f. write (r. content) f. close () print ("file saved successfully") else: print ("file already exists") failed T: print ("failed to crawl ")


The entire crawler framework and ideas:

Import requestsfrom bs4 import BeautifulSoupimport osdef getHtmlurl (url): # Get the url passdef getpic (html): # Get the image address and download passdef main (): main Function pass


Complete code is provided here

Import requestsfrom bs4 import BeautifulSoupimport osdef getHtmlurl (url): # obtain the url try: r = requests. get (url) r. raise_for_status () r. encoding = r. apparent_encoding return r. text into T: return "" def getpic (html): # Get the image address and download soup = BeautifulSoup (html, 'html. parser ') all_img = soup. find ('ul ', class _ = 'pli') find_all ('img') for img in all_img: src = img ['src'] img_url = src print (img_url) root = 'd:/pic/'path = root + img_url.split ('/') [-1] try: # create or determine whether the path image exists and download if not OS. path. exists (root): OS. mkdir (root) if not OS. path. exists (path): r = requests. get (img_url) with open (path, 'wb ') as f: f. write (r. content) f. close () print ("file saved successfully") else: print ("file already exists") failed T: print ("failed to crawl") def main (): url = 'HTTP: // www.ivsky.com/bizhi/yourname_v39947/'html = (getHtmlurl (url) print (getpic (html) main ()


Run the Code:

We can see that the images are saved locally. This is a simple case column. You can try it on your own.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.