Python crawler 5--Crawl and download images of specified specifications for Web pages

Source: Internet
Author: User

After reading the previous document, we have a basic understanding of the regular expression, in fact, the most effective way to learn is to carry the question and purpose, here we assume that there is a goal: to get a page of the specified size of the image of the link address, and download to local.

First, the implementation of the steps:

1. Open a webpage in the browser, for example: http://tieba.baidu.com/p/4691693167

2. Suppose we want to download a few large images of the page, then we need to get the URL of the image, which actually requires two steps to obtain, first of all to know the URL of the image, and the second is to view the HTML content of the current page to find the format containing this URL address, so that we can be filtered through the regular expression:

The steps to get the image name are simple, right-click the image and select Properties to see:

Copy this address, close the property interface, press the keyboard F12 to view the HTML content of the current page, search the image above the URL address, you can find:

3. The design of the regular expression is: R ' src= "(. +?\.jpg)" width ", where width is actually additional information, used to filter outside the specifications of the image URL, equivalent to additional filtering information.

Second, download the picture to save to the Local:

In fact, in the Urllib Library has inherited such a method, this method is Urllib.urlretrieve (), the remote data directly loaded into the local, for example:

Urllib.urlretrieve (Imgurl,'%s.jpg' % name)

Imgurl is the URL address of the target image, and name is the one after which the picture is saved to the local.
Because the image URL you get can be multiple, use the loop body call Urllib.urlretrieve () method to load a picture that conforms to the specifications locally.

Third, the implementation of code:

#Encoding:utf-8ImportUrllibImportRedefgethtml (URL): Response=urllib.urlopen (URL) HTML=Response.read ()returnHTML#gets the HTML content of the destination URLhtml = gethtml ("http://tieba.baidu.com/p/4691693167")    #get the URL of the picture and download it to the localdefgetimg (HTML): Reg= R'src= "(. +?\.jpg)" width'Imgre=Re.compile (reg) Imglist=Re.findall (imgre,html) x=0#load a picture using a loop chart     forImgurlinchImglist:urllib.urlretrieve (Imgurl,'%s.jpg'%x) x+=1#Start loading PicturesGetimg (HTML)

It is not difficult to see, in fact, the key point is still in the target information filtering regular expression design, the above script run results for the target image is saved to the local script in the directory under:

Python crawler 5--Crawl and download images of specified specifications for Web pages

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.