Python handles large data, and friends who need it can refer to it.
The recent big data competition is very hot, I did not learn how long python, want to try to write, just realize the data processing, mainly used dict,list,file knowledge.
Also have to say, I also use MATLAB to achieve, but run to almost two minutes, but the python seconds processing, there is wood, it shows Python processing text function powerful.
Data format in file:
ClientID shopingid num Date
1111000 3873 2 April 5
Clientinfo = []shopinginfo = {}month={}day={}shopidflag = 0clientstartflag = 0total={}tmpclientid= ' output= open (' f:/ A.txt ', ' a ') with the open (' F:/s.txt ', ' R ') as Data_file:for lineinfo in data_file:lineinfo = Lineinfo.split () ClientID = Lineinfo[0] Shopingid = lineinfo[1] num=[] Num.append (lineinfo[2)) data = lineinfo[3] data = data[:-1] data = data.split (' Month ' ) monthvar=[] Monthvar.append (data[0]) dayvar=[] Dayvar.append (data[1)) if ClientID in Clientinfo and Shopingid in Shopinginfo and Int (data[0]) >=6:shopinginfo[shopingid].append (lineinfo[2)) Month[shopingid].append (Data[0]) Day [Shopingid].append (data[1]) elif ClientID in Clientinfo and Shopingid not in Shopinginfo and int (data[0]) >=6: Shopinginfo[shopingid]=num month[shopingid]= monthvar Day[shopingid] = Dayvar elif ClientID not in Clientinfo: #if Clientstartflag = = 1:clientflag = 0 shopinglink= ' for (K, V) in Shopinginfo.items (): total={} vote=0 to I in V:if I in total:to Tal[i]+=1 Else:total[i]=1 for Var in total:if var = = ' 0 ': vote + = Total[var] elif var = ' 1 ': vote = 0 break elif var = ' 2 ': vote + = total[var]*2 Else:vote + = total[var]*3 if vote >= 3:if Clientflag = = 0:output.write (tmpclientid+ ' t ') clientflag =1 ' shopinglink+=k+ ', ' if Clientflag = 1: Output.write (Shopinglink.strip (', ') + ' \ r \ n ') shopinginfo={} month ={} Day ={} clientinfo=[] Tmpclientid=clientid Clientinfo.append (ClientID) shopinginfo[shopingid]=num Month[shopingid] = Monthvar Day[shopingid] = Dayvar Shopinglink = ' for (k, V) in Shopinginfo.items (): For I-v:if i in Total:total[i]+=1 else:total[i]=1 total={} vote=0 to I in v:if I in t Otal:total[i]+=1 Else:total[i]=1 for var in total:if var = = ' 0 ': Vote + + Total[var] elif var = 1 ': Vote = 0 break elif var = = ' 2 ': Vote + total[var]*2 Else:vote = total[var]*3 if vote >= 3:if clientflag = 0:clientflag =1 ', ' if Clientflag = = 1:output.write (tmpclientid+ ' t ') output.write (Shopinglink.strip (', ')) Data_file.close () Output.close ( )
guess you like:
1. Large http://www.aliyun.com/zixun/aggregation/14345.html "> Interpretation of data Processing and analysis methods
2. "Dry" with R for large data processing set
3. Youku introduced spark to deepen large data processing