Python3.4編程實現簡單抓取爬蟲功能樣本,python3.4爬蟲

來源:互聯網
上載者:User

Python3.4編程實現簡單抓取爬蟲功能樣本,python3.4爬蟲

本文執行個體講述了Python3.4編程實現簡單抓取爬蟲功能。分享給大家供大家參考,具體如下:

import urllib.requestimport urllib.parseimport reimport urllib.request,urllib.parse,http.cookiejarimport timedef getHtml(url):  cj=http.cookiejar.CookieJar()  opener=urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))  opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36'),('Cookie','4564564564564564565646540')]  urllib.request.install_opener(opener)  page = urllib.request.urlopen(url)  html = page.read()  return html#print ( html)#html = getHtml("http://weibo.com/")def getimg(html):  html = html.decode('utf-8')  reg='"screen_name":"(.*?)"'  imgre = re.compile(reg)  src=re.findall(imgre,html)  return src#print ("",getimg(html))uid=['2808675432','3888405676','2628551531','2808587400']for a in list(uid):  print (getimg(getHtml("http://weibo.com/"+a)))  time.sleep(1)

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.