python爬蟲學習(2)__抓取糗百段子,與存入mysql資料庫

來源:互聯網
上載者:User

標籤:

import pymysqlimport requestsfrom bs4 import BeautifulSoup
#pymysql連結資料庫conn=pymysql.connect(host=‘127.0.1‘,unix_socket=‘/tmp/mysql.sock‘,user=‘root‘,passwd=‘19950311‘,db=‘mysql‘)cur=conn.cursor()cur.execute("USE scraping")
#儲存段子標題,內容def store(title,content): cur.execute("insert into pages(title,content) values(\"%s\",\"%s\")",(title,content)) cur.connection.commit()global linksclass QiuShi(object): def __init__(self,start_url): self.url=start_url def crawing(self): try: html=requests.get(self.url,‘lxml‘) return html.content except ConnectionError as e: return ‘‘ def extract(self,htmlContent): if len(htmlContent)>0: bsobj=BeautifulSoup(htmlContent,‘lxml‘) #print bsobj jokes=bsobj.findAll(‘div‘,{‘class‘:‘article block untagged mb15‘}) for j in jokes: text=j.find(‘h2‘).text content=j.find(‘div‘,{‘class‘:‘content‘}).string if text != None and content != None: # print text,content,資料庫編碼為utf-8 store(text.encode(‘utf-8‘),content.encode(‘utf-8‘)) print text.encode(‘utf-8‘),content.encode(‘utf-8‘) print ‘------------------------------------------------------------------------------‘ else: print ‘‘ def main(self): text=self.crawing() self.extract(text)try: qiushi=QiuShi(‘http://www.qiushibaike.com/‘) qiushi.main()finally:
#關閉cursor,connection cur.close() conn.close()

 

python爬蟲學習(2)__抓取糗百段子,與存入mysql資料庫

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.