The main purpose of web crawler is to catch the sister map, and "sister map" site does not have annoying anti-crawler mechanism, is I choose to use parasitic good site, and since I wrote this code, I lost two pounds I will be everywhere nonsense?
experimental goal: Crawl from the 5200-page crawler to 5205 pages and grab pictures. (5200 pages is my home America Tianlina!!) )
From BS4 import BeautifulSoup #本次实验的主要捕获方式是用bs4 #
Import requests
Import re
I=0
For a in range (5200,5206): #设定从5200页翻到5205页 #
Url= "http://www.meizitu.com/a/" +str (a) + ". html" #比较直白的翻页方式 #
Html=requests.get (URL)
A= ' <p><div id= ' picture ' > '
Content=html.text.partition (A) [2]
b= ' <div class= ' boxinfo ' > '
Body=content.partition (B) [0]
#源代码里的img节点里的title不全相同, cut in partition way, but it's better with XPath.
Soup=beautifulsoup (Body, "html.parser")
Pictures=soup.find_all ("img")
For picture in pictures:
# print (picture["src"])
Print ("Now Downloading:" +str (i))
Pic=requests.get (picture["src"])
Fp=open ("E:/pythonaaa/b/study & test/" +str (i) + ". jpg", "WB")
Fp.write (pic.content) #wb二进制写入搭配content将整个文件抓下来 #
I=i+1
This article is from "Life is waiting for Gordo" blog, please make sure to keep this source http://chenx1242.blog.51cto.com/10430133/1731790
Sister net caught Beauty