大資料處理技術——python

來源:互聯網
上載者:User

python 處理大資料,有需要的朋友可以參考下。

最近大資料競賽很火,本人python沒學多久,想試著寫一下,只是實現了資料的處理,主要用到了dict,list,file知識。

還有一點要說,我也用matlab實現了,但是運行完要差不多兩分鐘,但是python秒處理,有木有啊,足見python處理文本功能之強大。

檔裡的資料格式:

clientid shopingid num date

1111000 3873 2 4月5日

clientinfo = []shopinginfo = {}month={}day={}shopidflag = 0clientstartflag = 0total={}tmpclientid=''output= open('f:/ a.txt','a')with open('f:/s.txt','r') as data_file: for lineinfo in data_file: lineinfo = lineinfo.split() clientid = linei nfo[0] shopingid = lineinfo[1] num=[] num.append(lineinfo[2]) data = lineinfo[3] data = data[:-1] data = data.split('月') m onthvar=[] monthvar.append(data[0]) dayvar=[] dayvar.append(data[1]) if clientid in clientinfo and shopingid in shopingin fo and int(data[0])>=6: shopinginfo[shopingid].append(lineinfo[2]) month[shopingid].append(data[0]) day[ shopingid].append(data[1]) elif clientid in clientinfo and shopingid not in shopinginfo and int(data[0])>=6: shopinginf o[shopingid]=num month[shopingid]= monthvar day[shopingid] = dayvar elif clientid not in clientinfo : #if clientstartflag == 1: clientflag = 0 shopinglink='' for (k, v) in shopinginfo.items(): total={} vote=0 for i in v: if i in total: total[i] +=1 else: total[i]=1 for var in total: if var == '0': vote += total[var] elif var == '1': vote = 0 break elif var == '2': vote += total[var]*2 else: vote += total[var] *3 if vote >= 3: if clientflag == 0: output.write(tmpclientid+'\t') clientflag =1 shopinglink+=k+',' if clientflag == 1 : output.write(shopinglink.strip(',')+'\r\n') shopinginfo={} month ={} day ={} clientinfo=[] tmpclientid=clientid clientinfo.append(clientid) shopinginfo[shopingid]=num month[shopingid] = monthvar day[shopingid] = dayvar shopinglink ='' for (k, v) in shopinginfo.items(): for i in v: if i in total: total[i]+=1 else: total[i]=1 total={} vote=0 for i in v: if i in total: total[i]+=1 else: total[i]=1 for var in total: if var == '0': vote += total[var] elif var == '1': vote = 0 break elif var == '2': vote += total[var]*2 else: vote += total[var]*3 if vote >= 3: if clientflag == 0: clientflag =1 shopinglink+=k+',' if clientflag == 1: output.write(tmpclientid+'\t') output.write(shopinglink.strip(',')) data_ file.close() output.close()





猜您喜歡:

1.大HTTP://www.aliyun.com/zixun/aggregation/14345.html">資料處理與分析方法解讀

2.【乾貨】用R進行大資料處理集

3.優酷引入Spark深化大資料處理

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.