This is from a long time ago, it was just beginning to play Ubuntu, but Ubuntu wallpaper is not very good, so I tried to find a beautiful wallpaper to replace the original, but I want to let the wallpaper play like a slideshow, but Ubuntu is not like windows, no such features, So only the internet to find a way to solve, finally in the Ubuntu forum to see the variety this east, this thing is really cool, you can edit the source of the picture, but it itself by default with a few sources, and just where there are wallheaven (at that time or wallpaper, After the website changed called Wallheaven), so I went to the wallheaven to stroll around, the results of a bright, completely satisfied with the eyes of greed, is too beautiful, a variety of natural photography and design drawings, simply feast.
After, I in win under the wallpaper to look tired of time, will go to wallheaven above look, change a new, alleviate alleviate esthetic fatigue. However, every time to do a bit too much trouble, every time to download, there is no way to once and for all? This is the time to think of the crawler, since the crawler must be determined to determine the target, of course, is the beautiful paradise Wallheaven, but the previous download experience tells me that the image of Wallheaven in the database are numbered forms exist, which is a lot simpler, Because to download all the pictures do not need to do what parsing HTML, URLs are almost the same, only need to change ***.jpg on it, so long as a simple for loop can be done, say not much, borrow Linus words "talk is cheap, show me Your Code ", below we look directly at the codes:
Import urllib2from urllib2 import httperror, urlerrorimport mathimport sysimport timeprint "The start Time:" +time.ctime () x = input ("Start value:") y = input ("Stop value:") K = y-xfor I in range (x, y+1): url = ' https://wallpapers.wallhaven.cc/ wallpapers/full/wallhaven-' +str (i) + '. jpg ' req = urllib2. Request (URL) req.add_header ("User-agent", "Mozilla 5.10") name = ' h:\\pic\\ ' +str (i) + '. jpg ' try:conn = URL Lib2.urlopen (req) f = open (name, ' WB ') F.write (Conn.read ()) F.close () if K < 100: Sys.stdout.write ("Computing: [%s%s]%.2f%%\r"% (' # ' * (i-x), '-' * (Y-i), (i-x) *100/k)) Sys.stdout.flush () Time.sleep (0.01) else:sys.stdout.write ("Computing: [%s%s]%.2f%% \ r"% (' # ' * ((i-x)/(K/50)) , '-' * ((y-i)/(K/50), (i-x) *100.0/k)) Sys.stdout.flush () time.sleep (0.01) except Httperror, E: i = i+1 except Urlerror, E:print "The Server\ ' s something is wrong!"Breakprint" \nthe finished time: "+time.ctime () print ' pic Saved '
To download the image is to use a Python module urllib2, in fact, I used to urllib, but suddenly there is a problem, is to visit the site is forbidden. This problem looked for a long time to find out, because I found that the picture changed to download not down, I think it will be the module has a problem. So I went to Baidu casually search a picture, find its web site to carry out experiments, the results can be. This time I was dumbfounded, this is why, because the browser access is really OK, no way I printed a bit of url.urlopen (URL), the result of which is written in the "the site's owner banned your access to this Website ". The original is so, actually forbid me to visit, but the browser can access Ah, this is involved in a problem, user-agent is the user agent, that is, only through the browser site to allow users access, so directly with Urlopen to connect to definitely not. This is not simple, we disguise as a browser is not OK, since we want to disguise as a browser, we can not use urllib, because it does not request, it can only accept the URL, which means that we can not use it to disguise user-agent, so we can only use URLLIB2. This allows you to use the request to set the HTTP header to disguise the User-agent spoofed Web site, sure enough.
The above code and k<100 to determine what this is, this is actually the command line buffer problem, because when the k is less than 100 when the display of the progress of the string is within the width of the buffer, and when the greater than 100, we have to reset, if you still use that, K How big, the string is how big, our character in the current line buffer is not enough, will be transferred to the next line, and the following clear buffer knowledge of the last line, so it looks a lot worse, so just to look good, no other meaning.
The above code has two processing, one is HTTP error, the other is the handling of URL error. Because I found in the download, sometimes the HTTP request will be an error, this time the program will automatically quit because of a one, so it is too bad, every time the exception will need to restart the program, this is not a programmer to do the thing ah, should let the program to help us to solve these things, Python is an object-oriented language, so the handling of exceptions is a piece of cake.
The whole program is like this, just over 30 lines, can meet our requirements, if not to look good, the above code can also be streamlined. Here's a look at the downloaded image:
To write this blog, has been downloaded about more than 13,000, but some of the pictures are too explicit, so I want to delete it, but too much, can not be done manually, so I want to use image recognition method to batch processing, as for the treatment method, please look at my next blog.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Download Wallheaven website images in bulk with Python