標籤:lib cond @class user window requests python exce imp
今日 好熱,照樣是挖洞挖不到,看了幾天的python爬蟲,學會了xpath解析
擼一個代碼玩玩】
不要說什麼,最佳化之類的,剛學完,跑了一陣 ,還可以 挺穩定
# -*- coding:utf-8 -*-#Xm17import osimport urllibimport requestsfrom lxml import etreeimport randomheaders = { ‘User-Agent‘:‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36‘}url = "http://www.ye1001.com/p06/list_{}.html"base_url = "http://www.ye1001.com/"def auto_down(url, filename): try: urllib.urlretrieve(url,filename) except urllib.ContentTooShortError: print ‘Network conditions is not good.Reloading.‘ auto_down(url,filename)for i in range(1,40): response = requests.get(url.format(i),headers=headers) html = etree.HTML(response.text) page = html.xpath("//div[@class=‘content bord mtop‘]//a/@href") for x in page: page_url = base_url + x if page_url.endswith("html"): title = str(page_url[-11:-5]) responses = requests.get(page_url,headers=headers) htmls = etree.HTML(responses.text) pages = htmls.xpath("//div[@class=‘mtop‘]//img/@src") os.mkdir(title) for i in pages: print i ddd = random.randint(1, 100) auto_down(i,title+"/%s"%title+"_"+str(ddd)+".jpg" )
今日就到這裡 ,洗澡去了
python爬xx圖代碼