python python 入門學習之網頁資料爬蟲搜狐汽車資料庫

最後更新：2015-01-25 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：

自己從事的是汽車行業，所以首先要做的第一個程式是抓取搜狐汽車的銷量資料庫（http://db.auto.sohu.com/cxdata/）；

資料庫提供了07年至今的汽車月銷量，每個車型對應一個xml資料，比如速騰的銷量：http://db.auto.sohu.com/xml/sales/model/model1004sales.xml

現在需要做的是遍曆所有車型，以這個格式儲存 ‘車型----日期----銷量’。

#!/usr/bin/python# -*- coding: utf-8 -*-import urllib2,string,re,timej=0file = open(‘D:\Program Files\Notepad++Portable\App\Notepad++\databasesohu.txt‘,‘r‘).read()f=file.split(‘\n‘)for n in range(0,len(f)):   #開始訪問  if f[n]<> "":    j=j+1   wb=urllib2.urlopen(‘http://db.auto.sohu.com/xml/sales/model/model‘+str(f[n])+‘sales.xml‘).read()   #擷取車型名字   code=wb[wb.index(‘name=‘)+6:wb.index(‘">‘)]    model=f[n]+"---"+code   #print model #標記用的   reg=‘sales date=.(.*?). salesNum=.(.*?)./>‘  #Regex   list=re.compile(reg).findall(wb)   for i in range(len(list),0,-1):    lt=list[i-1]     lt=lt[0]+"---"+lt[1]    Mdata=model+"---"+lt    print Mdata    file1 = open(‘D:\Program Files\Notepad++Portable\App\Notepad++\save.txt‘,‘a‘)    file1.write(Mdata+ ‘\n‘)    file1.close()    #時間延遲   time.sleep(0.5)  else:  print ‘over‘print j

file = open(‘D:\Program Files\Notepad++Portable\App\Notepad++\databasesohu.txt‘,‘r‘).read()f=file.split(‘\n‘)
開啟車型代碼大全，並用分行符號分割

wb=urllib2.urlopen(‘http://db.auto.sohu.com/xml/sales/model/model‘+str(f[n])+‘sales.xml‘).read()

然後開始遍曆車型，用URLlib2進行訪問，擷取汽車名稱model。
用Regex擷取日期及銷量（此處也可以用xml處理來獲得）。
將資料儲存至text文檔。
新手需要注意的問題是 python中檔案的讀取的方法，此處用的open(,‘a‘)，就是add的意思。


參考：http://www.cnblogs.com/allenblogs/archive/2010/09/13/1824842.html
     http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/001374738281887b88350bd21544e6095d55eaf54cac23f000

python python 入門學習之網頁資料爬蟲搜狐汽車資料庫

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More