International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Python

Python crawling path of a salted fish (3): crawling web images,

Last Update:2017-05-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

After learning the Requests library and Beautifulsoup library, we are going to practice a wave today to crawl web images. As you have learned, you can only crawl images on html pages, but not images generated by JavaScript.
So I found this website http://www.ivsky.com.

There are a lot of gallery in the website, so we will find your name to crawl it.

Http://www.ivsky.com/bizhi/yourname_v39947/check the source code of this page:

We can see that the image information we want to capture is in <ul> and there is more than one ul In the webpage. So we need to capture the next class together and then the image address is in img. So we can use the BeautifulSoup library here. method to parse the webpage and capture the image information.

soup =BeautifulSoup(html,'html.parser')
all_img= (soup.find('ul', class_='pli').find_all('img'))
for img in all_img: 
    src=img['src']

We use the requests library to obtain the url:

Def getHtmlurl (url): # obtain the url try: r = requests. get (url) r. raise_for_status () r. encoding = r. apparent_encoding return r. text before T: return ""

We want to download the image and save it locally:

Try: # create or determine whether the path image exists and download if not OS. path. exists (root): OS. mkdir (root) if not OS. path. exists (path): r = requests. get (img_url) with open (path, 'wb ') as f: f. write (r. content) f. close () print ("file saved successfully") else: print ("file already exists") failed T: print ("failed to crawl ")

The entire crawler framework and ideas:

Import requestsfrom bs4 import BeautifulSoupimport osdef getHtmlurl (url): # Get the url passdef getpic (html): # Get the image address and download passdef main (): main Function pass

Complete code is provided here

Import requestsfrom bs4 import BeautifulSoupimport osdef getHtmlurl (url): # obtain the url try: r = requests. get (url) r. raise_for_status () r. encoding = r. apparent_encoding return r. text into T: return "" def getpic (html): # Get the image address and download soup = BeautifulSoup (html, 'html. parser ') all_img = soup. find ('ul ', class _ = 'pli') find_all ('img') for img in all_img: src = img ['src'] img_url = src print (img_url) root = 'd:/pic/'path = root + img_url.split ('/') [-1] try: # create or determine whether the path image exists and download if not OS. path. exists (root): OS. mkdir (root) if not OS. path. exists (path): r = requests. get (img_url) with open (path, 'wb ') as f: f. write (r. content) f. close () print ("file saved successfully") else: print ("file already exists") failed T: print ("failed to crawl") def main (): url = 'HTTP: // www.ivsky.com/bizhi/yourname_v39947/'html = (getHtmlurl (url) print (getpic (html) main ()

Run the Code:

We can see that the images are saved locally. This is a simple case column. You can try it on your own.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

christian fish symbol images python 3 web hosting python 3 web scraping fish cartoon images plenty of fish search engine python library path django python path

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python crawling path of a salted fish (3): crawling web images,

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support