Python batch download webpage image details tutorial

Source: Internet
Author: User

Many of my friends are searching for ways to download images in batches online ~ I found it messy. Here, let's share with you how to use Python to batch download images.

Official Python 32bit Installation

Http://www.6686.com/soft/19115.html

Latest official version of Python 64-bit

Http://www.6686.com/soft/19116.html

Python core programming (second edition) PDF (HD) electronic version

Objective: To Crawl more than n pages of a website. Each link has more than n images, and each page corresponds to a folder. Each folder contains the folders corresponding to n links.

Step 1: obtain all links to the web page, access all links, and obtain the image address in the link.

Step 2: Download the Image Based on the image address.

It is easy to download jpg images.

1 socket = urllib2.urlopen (url)

2 data = socket. read ()

3 with open (path, "wb") as jpg:

4. jpg. write (data)

5 socket. close ()

The url is the image address, and the path is the Save path.

After completing this step, you can simply download images in batches.

However, there are several problems during the download process.

1. the download speed is slow.

When we open a website, we can see that the speed of images is not too slow, but it takes a long time to download an image in this way, sometimes it is very fast.

2. It's stuck there as soon as it goes down.

It is stuck there, and I don't know when to report an error.

The improvements are as follows.

1 # Set the timeout time in seconds and place it at the beginning of the program.

2 timeout = 60

3 socket. setdefatimetimeout (timeout)

4

5 # When downloading images

6 time. sleep (10) # first sleep, then read data

7 socket = urllib2.urlopen (urllib2.Request (imgurl ))

8 data = socket. read ()

9 socket. close ()

10...

In fact, this improvement was not very obvious in the program at the time, but I added another thing: multithreading.

Python multithreading can be implemented in several ways. You can understand it through this blog.

Here I use the method of inheriting threading. Thread to implement multithreading.

Reload the run method. Here I open a thread for every download of an image (it seems not very good, thanks ......).

1 thread = Download ()

2 thread. imgurl = imgurl

3 thread. path = path

4 thread. start ()

After using this multi-thread, the entire program is like starting to download. I downloaded more than 100 MB of images in a short time!

In fact, I had a concern at the beginning, that is, why does the thread sleep secretly occupy the system time? Take a look at the experiment in this article.

That is to say, it takes about 10 seconds for each of the 10 threads to sleepfor 10 seconds.

The image download speed is very fast. Although there are some abnormal URLs in the middle, the speed is very fast. (Some Exception Handling was added later)

Soon, an exception occurred and a large number of images failed to be downloaded.

After a long study, I found that the storage space is insufficient ......

Therefore, move to a 10 Gb idle partition and start downloading to improve exception handling.

At the end of the process, 8 GB of downloads were completed. I don't know if the traffic is too high. Today, the network is always disconnected ......

At the same time, I tried to download the video. This function is still to be explored.

This is the end of the tutorial ~ Hope to help you ~

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.