If you need to work with large files and do not want to rely on the database, splitting the file is also a solution. The following is a brief introduction to the Python implementation method that divides the data into months.
#引入依赖库
PD
datetime
OS
Splitdata ():
". /dataset/ijcai-17/dataset/user_pay.txt "
columnsname = [' user_id ',' shop _id ',' Time_stamp ']
' 2015-07-01 00:00:00' 2016-11-01 00:00:00 '
#pandas下一个很好用的时间处理函数, very easy to use.
DateRange = Pd.date_range (starttime,endtime,freq=' MS ') #以月分割 num = daterange.size
#定义文件字典
fplist = {}
Enumerate (daterange):
index = = num-1:
break
#开始时间
strsdate = str (Daterange[index])
#结束时间
stredate = str (daterange[index+1])
#自动命名文件名
' txt '
fp = open (path,' w+ ')
fplist[strsdate] = FP
"For" (KEY,FP) in Fplist.items (
): Fp.close ()
'''
First written here, sleepy not, the main idea has been on the above, the follow-up to add, deeply sorry.