From the Tushare crawl to the financial data, the first just want to save, with the way to Simple point, is: Insert-error-update
But found this method is too stupid, the exception will cause a large number of invalid connections, instead:
For Idx,row in D2.iterrows (): Try:rs=db.getdata ("Select f_code,f_time,%s from Caiwu where f_code=:1 and F_time=:2"%fldname,row["Code"],dat)ifLen (RS) ==0:db.dononquery ("INSERT INTO Caiwu (f_code,f_time,%s) VALUES (: 1,:2,:3)"%fldname,row["Code"],dat,row[colname])Else:ifRS[0][2] is None:db.doNonQuery ("Update Caiwu set%s=:1 where F_code=:2 and F_time=:3"%fldname,row[colname],row["Code"],dat) Except:log.errorlogger ().Exception("Data Inbound Error! ")
Operation is not a big problem, but is too slow, take two years of data, million or so, the morning has not all the storage. Had to study the optimization, the results found that MySQL actually has a special syntax, you can insert records, encountered duplicate records are automatically updated:
On DUPLICATE KEY UPDATE
The above processing is solved directly with an SQL statement:
INSERT into TABLE (a,c) VALUES (1,3) on DUPLICATE KEY UPDATE c=c+1;
And then further, batch storage also no problem, can also be processed separately:
INSERT into TABLE (a,b,c) VALUES (All-in-all), (2,5,7), (3,3,6), (4,8,2) on DUPLICATE KEY UPDATE b=values (b);
It's simply not convenient:
#数据入库: # D2: To be in storage dataframe, first column code, second column value # dat: Time # Fldname: Data in the Library field name def Addtodb (D2,dat, Fldname): i=0 while I<len (D2): kvs=reduce (lambda x, y: "%s%s ('%s ', '%s ',%s)"% (x , " if x== "Else ", ", y[0],dat,y[1]), d2.values[i:i+1000],"") sqlstr= "insert INTO Caiwu (f_code,f_time,%s) values%s on DUPLICATE KEY UPDATE%s=values (%s)"% (Fldname, Kvs,fldname,fldname) try: db.dononquery (SQLSTR) except: Log.errorlogger (). Exception(" Data inbound Error! ") i+=1000
Test, basically instantaneous storage!
MySQL efficient insert/update data