Python implementation crawler download beautiful pictures

Source: Internet
Author: User
Tags xpath

Python implementation crawler download beautiful pictures This crawl of the bar is the beauty of Baidu, to the vast number of male compatriots to inspire some

Before crawling, you need to log in to the browser first login Baidu bar account, you can also use post in the code to submit or add cookies

Crawling Address: http://tieba.baidu.com/f?kw=%E7%BE%8E%E5%A5%B3&ie=utf-8&pn=0

#-*-coding:utf-8-*-import urllib2import reimport requestsfrom lxml Import etree These are the libraries to import, the code does not use the regular, the XPath is used, Regular difficult children's shoes can try to use the following recommendation you first use the basic library to write, so you can learn more links=[] #遍历url的地址k =1 print U ' Please enter the last page: ' Endpage=int (Raw_input ()) #最终的页数 (R ' \ d+ (? =\s* page) This is a more generic regular fetch total number of pages of code, of course, the final group# here is to manually enter the page, to avoid too much content for J in range (0,endpage): Url= ' http://tieba.baidu.com/f? Kw=%e7%be%8e%e5%a5%b3&ie=utf-8&pn= ' +str (j) #页数的url地址 Html=urllib2.urlopen (URL). Read () #读取首页的内 Capacity Selector=etree. HTML (HTML) #转换为xml, which is used to identify Links=selector.xpath ('//div/a[@class = "J_th_tit"]/@href ') in the next #抓取当前页面的所有帖子的url #大家可以使用浏览器自带的源码查看工具, view the element at the specified target, so that the faster for I in links:url1= "http://tieba.baidu.com" +i #因为爬取到的地址是相对地址, so add Baidu's Domai n Html2=urllib2.urlopen (URL1). Read () #读取当前页面的内容 Selector=etree. HTML (HTML2) #转换为xml用于识别 link=selector.xpath ('//img[@class = "Bde_image"]/@src ') #抓取图片, you can also change to regular, or other content you want # this    is the traversal download for each in link: the #print each print U ' is downloading%d '%k  Fp=open (' image/' +str (k) + '. bmp ', ' WB ') #下载在当前目录下 within the image folder, the image format is BMP Image1=urllib2.urlopen (each). Read () # Read the contents of the picture Fp.write (image1) #写入图片 fp.close () k+=1 #k就是文件的名字, each download a file plus 1print u ' download done! '


If you want to crawl the content of other sites, you can refer to http://www.aichengxu.com/view/60418


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Python implementation crawler download beautiful pictures

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.