python 處理大資料,有需要的朋友可以參考下。
最近大資料競賽很火,本人python沒學多久,想試著寫一下,只是實現了資料的處理,主要用到了dict,list,file知識。
還有一點要說,我也用matlab實現了,但是運行完要差不多兩分鐘,但是python秒處理,有木有啊,足見python處理文本功能之強大。
檔裡的資料格式:
clientid shopingid num date
1111000 3873 2 4月5日
clientinfo = []shopinginfo = {}month={}day={}shopidflag = 0clientstartflag = 0total={}tmpclientid=''output= open('f:/ a.txt','a')with open('f:/s.txt','r') as data_file: for lineinfo in data_file: lineinfo = lineinfo.split() clientid = linei nfo[0] shopingid = lineinfo[1] num=[] num.append(lineinfo[2]) data = lineinfo[3] data = data[:-1] data = data.split('月') m onthvar=[] monthvar.append(data[0]) dayvar=[] dayvar.append(data[1]) if clientid in clientinfo and shopingid in shopingin fo and int(data[0])>=6: shopinginfo[shopingid].append(lineinfo[2]) month[shopingid].append(data[0]) day[ shopingid].append(data[1]) elif clientid in clientinfo and shopingid not in shopinginfo and int(data[0])>=6: shopinginf o[shopingid]=num month[shopingid]= monthvar day[shopingid] = dayvar elif clientid not in clientinfo : #if clientstartflag == 1: clientflag = 0 shopinglink='' for (k, v) in shopinginfo.items(): total={} vote=0 for i in v: if i in total: total[i] +=1 else: total[i]=1 for var in total: if var == '0': vote += total[var] elif var == '1': vote = 0 break elif var == '2': vote += total[var]*2 else: vote += total[var] *3 if vote >= 3: if clientflag == 0: output.write(tmpclientid+'\t') clientflag =1 shopinglink+=k+',' if clientflag == 1 : output.write(shopinglink.strip(',')+'\r\n') shopinginfo={} month ={} day ={} clientinfo=[] tmpclientid=clientid clientinfo.append(clientid) shopinginfo[shopingid]=num month[shopingid] = monthvar day[shopingid] = dayvar shopinglink ='' for (k, v) in shopinginfo.items(): for i in v: if i in total: total[i]+=1 else: total[i]=1 total={} vote=0 for i in v: if i in total: total[i]+=1 else: total[i]=1 for var in total: if var == '0': vote += total[var] elif var == '1': vote = 0 break elif var == '2': vote += total[var]*2 else: vote += total[var]*3 if vote >= 3: if clientflag == 0: clientflag =1 shopinglink+=k+',' if clientflag == 1: output.write(tmpclientid+'\t') output.write(shopinglink.strip(',')) data_ file.close() output.close()
猜您喜歡:
1.大HTTP://www.aliyun.com/zixun/aggregation/14345.html">資料處理與分析方法解讀
2.【乾貨】用R進行大資料處理集
3.優酷引入Spark深化大資料處理