Explain the example code of a Python crawler crawling a GIF image on a comic

Source: Internet
Author: User
This article explains the example code for writing a Python crawler to crawl a GIF on a comic, with sample code Python3, using the Urllib module, request module, and BeautifulSoup module, and the friends you need can refer to

This article is to introduce the crawler is to crawl the cartoon gif funny image, convenient offline viewing. The crawler was developed by python3.3, and was mainly used in Urllib, request and BeautifulSoup modules.

The Urllib module provides a high-level interface for fetching data from the World Wide Web, and when we open a URL with Urlopen (), it is equivalent to opening a file with Python's built-in open (). But the difference is that the former receives a URL as a parameter, and there is no way to do a seek operation on the open file stream (from the bottom-up point of view, because it is actually the socket, so there is no way to do the seek operation), and the latter receives a local file name.

Python's BeautifulSoup module that can help you implement HTML and XML parsing
First of all, generally write web crawler, that is, crawl Web page HTML source content, and then analyze, extract the corresponding content.
This work of parsing HTML content, if only with ordinary regular expression re module to go a little match, for the content of simple page analysis, or basic enough.
But for a lot of work, to parse the content of the complex HTML, then with the RE module, you will find that can not be achieved, or difficult to achieve.
and using the BeautifulSoup module to help you achieve the analysis of HTML source work, you will find that things have become so simple, greatly improved the efficiency of the analysis of HTML source code.
Note: BeautifulSoup is a third-party library and I am using BS4. URLLIB2 is assigned to the urllib.request in Python3, and the original text in the document is as follows.
Note:the URLLIB2 module has been split across several modules in Python 3 named Urllib.requestand Urllib.error.
The crawler source code is as follows

#-*-Coding:utf-8-*-import urllib.requestimport bs4,ospage_sum = 1 #设置下载页数path = OS.GETCWD () path = os.path.join (path, ' burst Walk gif ') if not os.path.exists (path): Os.mkdir (path) #创建文件夹url = "Http://baozoumanhua.com/gif/year" #url地址 headers = {#伪装浏览器 ' user-agent ': ' mozilla/5.0 (Windows NT 6.1;  WOW64) applewebkit/537.36 (khtml, like Gecko) "chrome/32.0.1700.76 safari/537.36 '}for count in range (Page_sum): req = urllib.request.Request (url = url+str (count+1), headers = headers) print (req.full_url) content = URLLIB.R Equest.urlopen (req). Read () soup = bs4.  BeautifulSoup (content) # BeautifulSoup img_content = Soup.findall (' img ', attrs={' style ': ' width:460px '}) url_list = [img[' src '] for img in img_content] #列表推导 url title_list = [img[' alt '] for IMG in img_content] #图片名称 for i in rang E (url_list.__len__ ()): Imgurl = url_list[i] filename = path + os.sep +title_list[i] + ". gif" Print (filename+ ":" + Imgurl) #打印下Download Information Urllib.request.urlretrieve (imgurl,filename) #下载图片 

On line 15th, you can modify the number of downloads, save the file as baozougif.py, and use the command Python baozougif.py to generate a "gif" folder in the same directory after running, and all the pictures will be downloaded to that directory automatically. "

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.