Python實現登入人人網並抓取新鮮事的方法

來源:互聯網
上載者:User
本文執行個體講述了Python實現登入人人網並抓取新鮮事的方法。分享給大家供大家參考。具體如下:

這裡示範了Python登入人人網並抓取新鮮事的方法(抓取後的排版不太美觀~~)

from sgmllib import SGMLParserimport sys,urllib2,urllib,cookielibclass spider(SGMLParser):  def __init__(self,email,password):    SGMLParser.__init__(self)    self.h3=False    self.h3_is_ready=False    self.div=False    self.h3_and_div=False    self.a=False    self.depth=0    self.names=""    self.dic={}      self.email=email    self.password=password    self.domain='renren.com'    try:      cookie=cookielib.CookieJar()      cookieProc=urllib2.HTTPCookieProcessor(cookie)    except:      raise    else:      opener=urllib2.build_opener(cookieProc)      urllib2.install_opener(opener)      def login(self):    url='http://www.renren.com/PLogin.do'    postdata={         'email':self.email,         'password':self.password,         'domain':self.domain          }    req=urllib2.Request(              url,              urllib.urlencode(postdata)                    )    self.file=urllib2.urlopen(req).read()    #print self.file  def start_h3(self,attrs):    self.h3 = True  def end_h3(self):    self.h3=False    self.h3_is_ready=True  def start_a(self,attrs):    if self.h3 or self.div:      self.a=True  def end_a(self):    self.a=False  def start_div(self,attrs):    if self.h3_is_ready == False:      return    if self.div==True:      self.depth += 1    for k,v in attrs:      if k == 'class' and v == 'content':        self.div=True;        self.h3_and_div=True  #h3 and div is connected  def end_div(self):    if self.depth == 0:      self.div=False      self.h3_and_div=False      self.h3_is_ready=False      self.names=""    if self.div == True:      self.depth-=1  def handle_data(self,text):    #record the name    if self.h3 and self.a:      self.names+=text    #record says    if self.h3 and (self.a==False):      if not text:pass      else: self.dic.setdefault(self.names,[]).append(text)      return    if self.h3_and_div:      self.dic.setdefault(self.names,[]).append(text)  def show(self):    type = sys.getfilesystemencoding()    for key in self.dic:      print ( (''.join(key)).replace(' ','')).decode('utf-8').encode(type), \         ( (''.join(self.dic[key])).replace(' ','')).decode('utf-8').encode(type)renrenspider=spider('your email','your password')renrenspider.login()renrenspider.feed(renrenspider.file)renrenspider.show()

希望本文所述對大家的Python程式設計有所協助。

  • 聯繫我們

    該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

    如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.