recently in a long database merge script, you want to merge table data from multiple sub-databases into one primary database. Here are the mistakes I made when I wrote the data correction script.
Unnecessary queries
Take a look at the following statement:
regiondb = db. Houyiregiondb () houyidb = db. HOUYIDB (Read_only= = Regiondb.query (vmmacsfromregiondbsql) houyidbret = Houyidb.query (vmmacsfromhouyidbsql) if len (Regiondbret) == 0 : return
Houyidb.query (vmmacsfromhouyidbsql) to become an unnecessary query. If this query pulls a lot of data, it can be a waste of time. BYTE is Money! Today's programmers may not be as "stingy" as the previous programmers, but also "budget-conscious". The fix is simple, and the order of the statements is reversed:
= db. Houyiregiondb () = regiondb.query (vmmacsfromregiondbsql) iflen == 0 : regiondb. Close () return = db. HOUYIDB (read_only=False) = houyidb.query (vmmacsfromhouyidbsql)
Lesson: Write a program to avoid thinking without hesitation.
Lock Timeout
is checked because insert x table while concurrent delete from X where .... Insert precedes, delete X statement waits for lock. Since insert X inserts a hundred thousand of records, it takes more than 1 minutes, while innodb_lock_wait_timeout = 50s (show variables like "%timeout%";)
Here you need to optimize the SQL statement. The optimization method is to divide a hundred thousand of records into multiple commits, committing 1000 insert statements at a time. The code is as follows:
defDivideintogroups (Alltuples, numpergroup=1000): " "divide tuples into group of tuples; Each group have no more than Numpergroup tuples default value of Numpergroup is" "groups=[] Totalnum=Len (alltuples)ifTotalnum <=numPerGroup:groups.append (alltuples)returngroups Start=0 Eachend= Start +Numpergroup whileStart <totalNum:groups.append (Alltuples[start:eachend]) Start+=Numpergroup eachend= Start +NumpergroupifEachend >=Totalnum:eachend=TotalnumreturngroupsdefInsertmanymany (Insertsql, Alltuples, db):" "insert many many records, usually more than 10000 insert once and insert (len/1000+1) times " "groups=divideintogroups (alltuples) Count=0 forGrouptuplesinchgroups:affectrows=Db.executemany (Insertsql, Grouptuples)ifAffectrows:count+=affectrows db.commit () needinsertnum=Len (alltuples) ispassedmsg= ('OK' ifNeedinsertnum==countElse 'SOME ERROR') Printandlog ("need insert%d records, and actual%d.%s"% (Needinsertnum, Count, ispassedmsg))
The calling method is as follows:
Insertsql = "INSERT into student (name, age) value (%s,%s)"
Alltuples = [("Zhang", +), ("Qian", +), ("Wang", "Max"), ..., ("Liu", 26)]
Insertmanymany (Insertsql, alltuples, DB)
The effect is obvious. Originally inserted 32,000 records need 18s, now only need 2-3s, originally inserted 129,968 Records need 67s, now only need 12-15s. At the same time, the insertion transaction for each commit is shortened, which reduces the lock wait time.
Database correction Script Performance optimization Two: Remove unnecessary queries and BULK INSERT SQL