Python implementation crawler download beautiful pictures This crawl of the bar is the beauty of Baidu, to the vast number of male compatriots to inspire some
Before crawling, you need to log in to the browser first login Baidu bar account, you can also use post in the code to submit or add cookies
Crawling Address: http://tieba.baidu.com/f?kw=%E7%BE%8E%E5%A5%B3&ie=utf-8&pn=0
#-*-coding:utf-8-*-import urllib2import reimport requestsfrom lxml Import etree These are the libraries to import, the code does not use the regular, the XPath is used, Regular difficult children's shoes can try to use the following recommendation you first use the basic library to write, so you can learn more links=[] #遍历url的地址k =1 print U ' Please enter the last page: ' Endpage=int (Raw_input ()) #最终的页数 (R ' \ d+ (? =\s* page) This is a more generic regular fetch total number of pages of code, of course, the final group# here is to manually enter the page, to avoid too much content for J in range (0,endpage): Url= ' http://tieba.baidu.com/f? Kw=%e7%be%8e%e5%a5%b3&ie=utf-8&pn= ' +str (j) #页数的url地址 Html=urllib2.urlopen (URL). Read () #读取首页的内 Capacity Selector=etree. HTML (HTML) #转换为xml, which is used to identify Links=selector.xpath ('//div/a[@class = "J_th_tit"]/@href ') in the next #抓取当前页面的所有帖子的url #大家可以使用浏览器自带的源码查看工具, view the element at the specified target, so that the faster for I in links:url1= "http://tieba.baidu.com" +i #因为爬取到的地址是相对地址, so add Baidu's Domai n Html2=urllib2.urlopen (URL1). Read () #读取当前页面的内容 Selector=etree. HTML (HTML2) #转换为xml用于识别 link=selector.xpath ('//img[@class = "Bde_image"]/@src ') #抓取图片, you can also change to regular, or other content you want # this is the traversal download for each in link: the #print each print U ' is downloading%d '%k Fp=open (' image/' +str (k) + '. bmp ', ' WB ') #下载在当前目录下 within the image folder, the image format is BMP Image1=urllib2.urlopen (each). Read () # Read the contents of the picture Fp.write (image1) #写入图片 fp.close () k+=1 #k就是文件的名字, each download a file plus 1print u ' download done! '
If you want to crawl the content of other sites, you can refer to
http://www.aichengxu.com/view/60418
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Python implementation crawler download beautiful pictures