python爬xx圖代碼

來源:互聯網
上載者:User

標籤:lib   cond   @class   user   window   requests   python   exce   imp   

今日 好熱,照樣是挖洞挖不到,看了幾天的python爬蟲,學會了xpath解析

擼一個代碼玩玩】

 

不要說什麼,最佳化之類的,剛學完,跑了一陣 ,還可以  挺穩定

 

# -*- coding:utf-8 -*-#Xm17import osimport urllibimport requestsfrom lxml import etreeimport randomheaders = {    ‘User-Agent‘:‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36‘}url = "http://www.ye1001.com/p06/list_{}.html"base_url = "http://www.ye1001.com/"def auto_down(url, filename):    try:        urllib.urlretrieve(url,filename)    except urllib.ContentTooShortError:        print ‘Network conditions is not good.Reloading.‘        auto_down(url,filename)for i in range(1,40):    response = requests.get(url.format(i),headers=headers)    html = etree.HTML(response.text)    page = html.xpath("//div[@class=‘content bord mtop‘]//a/@href")    for x in page:        page_url = base_url + x        if page_url.endswith("html"):            title = str(page_url[-11:-5])            responses = requests.get(page_url,headers=headers)            htmls = etree.HTML(responses.text)            pages = htmls.xpath("//div[@class=‘mtop‘]//img/@src")            os.mkdir(title)            for i in pages:                print i                ddd = random.randint(1, 100)                auto_down(i,title+"/%s"%title+"_"+str(ddd)+".jpg" )

  

 

今日就到這裡 ,洗澡去了

python爬xx圖代碼

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.