Python implements batch download of code instances for wallpapers and python wallpapers

Source: Internet
Author: User
Tags virtualenv

Python implements batch download of code instances for wallpapers and python wallpapers

Project address: https://github.com/jrainlau/wallpaper-downloader

Preface

I haven't written any articles for a long time, because I have been adapting to new posts recently and learning python in my spare time. This article is a summary of the latest python learning phase. It developed a crawler to download the high-definition wallpapers of a website in batches.

Note: The project to which this article belongs is only used for python learning and cannot be used for other purposes!

Initialize a project

Project usedvirtualenvCreate a virtual environment to avoid global pollution. Usepip3Download it directly:

pip3 install virtualenv

Createwallpaper-downloaderDirectory, usevirtualenvCreate a project namedvenvVirtual Environment:

virtualenv venv. venv/bin/activate

Next, create the dependency directory:

echo bs4 lxml requests > requirements.txt

Finally, download and install the dependency of yun:

pip3 install -r requirements.txt

Analysis of crawler steps

For simplicity, go to the wallpaper list page classified as "aero": http://wallpaperswide.com/aer ....

We can see that there are 10 wallpapers available for download on this page. However, the thumbnails are displayed here, and the definition of the wallpaper is far from enough. Therefore, we need to go to the wallpaper details page and find the HD download link. Click the first wallpaper to view a new page:

Because my machine is a Retina screen, I plan to directly download the largest one to ensure high definition (the size shown in red circles ).

After learning about the specific steps, you can use the developer tool to find the corresponding dom node and extract the corresponding url. This process is no longer available. You can try it by yourself and enter the encoding section below.

Access page

Createdownload.pyFile, and then introduce two libraries:

from bs4 import BeautifulSoupimport requests

Next, write a function specifically used to access the url and then return the html of the page:

def visit_page(url): headers = {  'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36' } r = requests.get(url, headers = headers) r.encoding = 'utf-8' soup = BeautifulSoup(r.text, 'lxml') return soup

To prevent being hit by the anti-crawling mechanism of the website, we need to add UA in the header to disguise the crawler as a normal browser, then specify UTF-8 encoding, and finally return the html in string format.

Extract Link

After obtaining the html of the page, you need to extract the url corresponding to the wallpaper list on the page:

def get_paper_link(page): links = page.select('#content > div > ul > li > div > div a') collect = [] for link in links:  collect.append(link.get('href')) return collect

This function extracts the URLs of all the wallpaper details on the list page.

Download Wallpaper

With the address on the details page, we can select the appropriate size. After analyzing the dom structure of the page, we can know that each size corresponds to a link:

So the first step is to extract the links corresponding to these sizes:

wallpaper_source = visit_page(link)wallpaper_size_links = wallpaper_source.select('#wallpaper-resolutions > a')size_list = []for link in wallpaper_size_links: href = link.get('href') size_list.append({  'size': eval(link.get_text().replace('x', '*')),  'name': href.replace('/download/', ''),  'url': href })

size_listIs a collection of these links. In order to facilitate the next selection of the highest definition (largest volume) wallpapersizeI usedevalMethod.5120x3200For calculation,size.

After obtaining all the sets, you can usemax()The method selects the highest definition:

biggest_one = max(size_list, key = lambda item: item['size'])

Thisbiggest_oneInurlThe download link corresponding to the size.requestsThe library downloads the linked resources:

result = requests.get(PAGE_DOMAIN + biggest_one['url'])if result.status_code == 200: open('wallpapers/' + biggest_one['name'], 'wb').write(result.content)

Note: First, you need to createwallpapersDirectory. Otherwise, an error is reported during running.

Completedownload_wallpaperThe function length is as follows:

def download_wallpaper(link): wallpaper_source = visit_page(PAGE_DOMAIN + link) wallpaper_size_links = wallpaper_source.select('#wallpaper-resolutions > a') size_list = [] for link in wallpaper_size_links:  href = link.get('href')  size_list.append({   'size': eval(link.get_text().replace('x', '*')),   'name': href.replace('/download/', ''),   'url': href  }) biggest_one = max(size_list, key = lambda item: item['size']) print('Downloading the ' + str(index + 1) + '/' + str(total) + ' wallpaper: ' + biggest_one['name']) result = requests.get(PAGE_DOMAIN + biggest_one['url']) if result.status_code == 200:  open('wallpapers/' + biggest_one['name'], 'wb').write(result.content)

Batch run

The above steps can only be downloadedFirst wallpaper list pageOfFirst Wallpaper. If we want to downloadMultiple list pagesOfAll wallpapersWe need to call these methods cyclically. First, we define several constants:

import sysif len(sys.argv) != 4: print('3 arguments were required but only find ' + str(len(sys.argv) - 1) + '!') exit()category = sys.argv[1]try: page_start = [int(sys.argv[2])] page_end = int(sys.argv[3])except: print('The second and third arguments must be a number but not a string!') exit()

Here, three constants are specified by obtaining the command line parameters.category,page_startAndpage_endCorresponding to the wallpaper category, start page, and end page respectively.

For convenience, define two url-related constants:

PAGE_DOMAIN = 'http://wallpaperswide.com'PAGE_URL = 'http://wallpaperswide.com/' + category + '-desktop-wallpapers/page/'

Next we can perform batch operations happily. Before that, we will definestart()Start function:

def start(): if page_start[0] <= page_end:  print('Preparing to download the ' + str(page_start[0]) + ' page of all the "' + category + '" wallpapers...')  PAGE_SOURCE = visit_page(PAGE_URL + str(page_start[0]))  WALLPAPER_LINKS = get_paper_link(PAGE_SOURCE)  page_start[0] = page_start[0] + 1  for index, link in enumerate(WALLPAPER_LINKS):   download_wallpaper(link, index, len(WALLPAPER_LINKS), start)

Thendownload_wallpaperRewrite the function:

def download_wallpaper(link, index, total, callback): wallpaper_source = visit_page(PAGE_DOMAIN + link) wallpaper_size_links = wallpaper_source.select('#wallpaper-resolutions > a') size_list = [] for link in wallpaper_size_links:  href = link.get('href')  size_list.append({   'size': eval(link.get_text().replace('x', '*')),   'name': href.replace('/download/', ''),   'url': href  }) biggest_one = max(size_list, key = lambda item: item['size']) print('Downloading the ' + str(index + 1) + '/' + str(total) + ' wallpaper: ' + biggest_one['name']) result = requests.get(PAGE_DOMAIN + biggest_one['url']) if result.status_code == 200:  open('wallpapers/' + biggest_one['name'], 'wb').write(result.content) if index + 1 == total:  print('Download completed!\n\n')  callback()

Finally, specify the startup rule:

if __name__ == '__main__':  start()

Run the project

Enter the following code in the command line to start the test:

python3 download.py aero 1 2

The following output is displayed:

Take charles to capture the package and you can see that the script is running smoothly:

Now, the download script has been developed, so you don't have to worry about wallpaper shortage!

The above is all the content for you. If you have any questions, you can discuss it in the comment area below. Thank you for your support for the help house.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.