Compile a Python crawler to capture and share GIF images on a cartoon,

Last Update:2016-04-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article introduces crawlers to capture GIF interesting pictures on a runaway cartoon to facilitate offline viewing. Crawlers are developed using python3.3 and mainly use the urllib, request, and BeautifulSoup modules.

The urllib module provides high-level interfaces for retrieving data from the World Wide Web. When we open a URL with urlopen (), it is equivalent to opening a file with the built-in open () in Python. However, the difference is that the former receives a URL as a parameter and cannot perform seek operations on the opened file stream (from the underlying perspective, because the actual operation is socket, therefore, you cannot perform the seek operation), while the latter receives a local file name.

The BeautifulSoup module of Python helps you parse HTML and XML
First, write a web crawler, that is, capture the html source code and other content of the web page, and then analyze and extract the corresponding content.
This analysis of html content is basically enough for webpage analysis with simple content if the regular expression re module is used for a little matching.
However, if you need to parse html with complicated content due to heavy workload, you can find that the re module cannot be implemented or is difficult to implement.
Using the beautifulsoup module to help you analyze the html source code, you will find that the process becomes so simple, greatly improving the efficiency of html source code analysis.
Note: BeautifulSoup is a third-party library and I use bs4. Urllib2 is allocated to urllib. request in python3. The original Article in this document is as follows.
Note: The urllib2 module has been split into SS several modules in Python 3 named urllib. requestand urllib. error.
The crawler source code is as follows:

#-*-Coding: UTF-8-*-import urllib. requestimport bs4, ospage_sum = 1 # set the number of download pages path = OS. getcwd () path = OS. path. join (path, 'runaway GIF ') if not OS. path. exists (path): OS. mkdir (path) # create folder url = "http://baozoumanhua.com/gif/year" # url address headers = {# camouflage browser 'user-agent': 'mozilla/5.0 (Windows NT 6.1; WOW64) appleWebKit/537.36 (KHTML, like Gecko) ''chrome/32.0.1700.76 Safari/100'} for count in range (page_sum): req = urllib. request. request (url = url + str (count + 1), headers = headers) print (req. full_url) content = urllib. request. urlopen (req ). read () soup = bs4.BeautifulSoup (content) # BeautifulSoup img_content = soup. findAll ('img ', attrs = {'style': 'width: 460px '}) url_list = [img ['src'] for img in img_content] # list derivation url title_list = [img ['alt'] for img in img_content] # image name for I in range (url_list. _ len _ (): imgurl = url_list [I] filename = path + OS. sep + title_list [I] + ". gif "print (filename +": "+ imgurl) # print the download information urllib. request. urlretrieve (imgurl, filename) # download an image

You can modify the number of downloaded pages in row 15th and save the file as baozougif. py, use the command python baozougif. after py runs, a folder named "violent GIF" is generated in the same directory. All images are automatically downloaded to this directory.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Compile a Python crawler to capture and share GIF images on a cartoon,

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Compile a Python crawler to capture and share GIF images on a cartoon,

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support