使用JDBC串連MySQL資料庫進行資料插入的時候,特別是大批量資料連續插入(10W+),如何提高效率呢?
在JDBC編程介面中Statement 有兩個方法特別值得注意:
void addBatch() throws SQLException
Adds a set of parameters to this
PreparedStatement
object's batch of commands.
int[] executeBatch() throws SQLException
Submits a batch of commands to the database for execution and if all commands execute successfully, returns an array of update counts. The
int
elements of the array that is returned are ordered to correspond to the commands in the batch, which are ordered according to the order in which they were added to the batch.
通過使用addBatch()和executeBatch()這一對方法可以實現批量處理資料。
不過值得注意的是,首先需要在資料庫連結中設定手動提交,
connection.setAutoCommit(false),然後在執行Statement之後執行
connection.commit()。
package cyl.demo.ipsearcher;import java.io.BufferedReader;import java.io.FileInputStream;import java.io.IOException;import java.io.InputStreamReader;import java.sql.Connection;import java.sql.DriverManager;import java.sql.PreparedStatement;import java.sql.SQLException;public class DbStoreHelper {private String insert_sql;private String charset;private boolean debug;private String connectStr;private String username;private String password;public DbStoreHelper() {connectStr = "jdbc:mysql://localhost:3306/db_ip";// connectStr += "?useServerPrepStmts=false&rewriteBatchedStatements=true";insert_sql = "INSERT INTO tb_ipinfos (iplong1,iplong2,ipstr1,ipstr2,ipdesc) VALUES (?,?,?,?,?)";charset = "gbk";debug = true;username = "root";password = "***";}public void storeToDb(String srcFile) throws IOException {BufferedReader bfr = new BufferedReader(new InputStreamReader(new FileInputStream(srcFile), charset));try {doStore(bfr);} catch (Exception e) {e.printStackTrace();} finally {bfr.close();}}private void doStore(BufferedReader bfr) throws ClassNotFoundException,SQLException, IOException {Class.forName("com.mysql.jdbc.Driver");Connection conn = DriverManager.getConnection(connectStr, username,password);conn.setAutoCommit(false); // 設定手動提交int count = 0;PreparedStatement psts = conn.prepareStatement(insert_sql);String line = null;while (null != (line = bfr.readLine())) {String[] infos = line.split(";");if (infos.length < 5) continue;if (debug) {System.out.println(line);}psts.setLong(1, Long.valueOf(infos[0]));psts.setLong(2, Long.valueOf(infos[1]));psts.setString(3, infos[2]);psts.setString(4, infos[3]);psts.setString(5, infos[4]);psts.addBatch(); // 加入批量處理count++;}psts.executeBatch(); // 執行批量處理conn.commit(); // 提交System.out.println("All down : " + count);conn.close();}}
執行完成以後:
All down : 103498Convert finished.All spend time/s : 47
一共10W+,執行時間一共花費 47 秒.
這個效率仍然不高,似乎沒有達到想要的效果,需要進一步改進。在MySQL JDBC連接字串中還可以加入參數,
rewriteBatchedStatements=true,mysql預設關閉了batch處理,通過此參數進行開啟,這個參數可以重寫向資料庫提交的SQL語句,具體參見:http://www.cnblogs.com/chenjianjx/archive/2012/08/14/2637914.html
useServerPrepStmts=false,如果不開啟(useServerPrepStmts=false),使用com.mysql.jdbc.PreparedStatement進行本地SQL拼裝,最後送到db上就是已經替換了?後的最終SQL.
在此稍加改進,連接字串中加入下面語句(代碼構造方法中去掉注釋):connectStr += "?useServerPrepStmts=false&rewriteBatchedStatements=true";
再次執行如下:
All down : 103498Convert finished.All spend time/s : 10
同樣的資料量,這次執行只花費了10秒 ,處理效率大大提高.