python爬圖

來源:互聯網
上載者:User

標籤:

閑的無事,看著知乎裡種種python優點,按捺不住,裝起python3.4。

網上找了點爬行圖片的代碼,修改至相容3.4,成功爬行指定url所有jpg圖片,程式碼片段如下:

import osimport urllibimport urllib.requestimport re#爬行圖片download_path = os.path.dirname(os.path.abspath(__file__))class spider(object):    def __init__(self, url):        self.url = url    def parse(self,content):        pattern = ‘src="(http://.*\.jpg)\s*"‘        matchs = re.findall(pattern,content,re.M)        return matchs        def downloads(self,urls):        d_path = download_path + "/test"        if not os.path.exists(d_path):            os.mkdir(d_path)        for url in urls:            filename = url.split("/")[-1]            print (url)            print ("Downloads %s" % (filename))            output = "%s/%s" % (d_path, filename)            urllib.request.urlretrieve(url,output)               def run(self):        d_url = self.url        fd = urllib.request.urlopen(d_url)        try:            content = fd.read()            content = content.decode("UTF-8")            urls = self.parse(content)            self.downloads(urls)        finally:            fd.close()if __name__ == "__main__":    sp = spider("http://news.cnfol.com/img/20150814/17638.shtml")    sp.run()

 

python爬圖

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.