Database inserts millions of data

Source: Internet
Author: User
Tags commit ranges oracle database

This is a dig into a database job

First of all, the job title requirements:


Create a table that contains four fields, with the table named test

First column ID, primary key, self increment.

The second column is col1, a random mike,bob,jack,alice,cathy,ann,betty,cindy,mary,jane in the

The third column is col2, randomly for a 5-digit letter, with letters limited to A-E

The third column is col3, a random integer between 1-20

Insert 1 million records as required for the table in step one to record the time of execution

A certain preprocessing of the range of data to be inserted


(1) for col1, create an array of value ranges

private static string[] col1values={"Mike", "Bob", "Jack", "Alice", "Cathy", "Ann", "Betty", "Cindy", "Mary", "Jane"};

When randomly fetched, just call col1values[(int) (Math.random () *10)].

(2) for col2, create an array of value ranges by recursion

private static string[] col2values=new string[3125]; static{point=0 Initcol2value (5,new stringbuffer ("")); private static void Initcol2value (int n,stringbuffer str) {if (n==0) {col2values[point++]=new String (str); NT i=0;i<5;i++) {StringBuffer strtemp = new StringBuffer (str); Initcol2value (N-1,strtemp.append ((char) (' a ' +i));}

When randomly fetched, just call col2values[(int) (Math.random () *3125)].

(3) for col3, random acquisition of the time as long as (int) (Math.random () *20) +1 can be.

inserting data with large amounts of data

(1) The first thought of the method is, of course, a traditional line of insertion method: Get statement through connection, and then invoke the statement object's execute function to execute the SQL statement, insert a row, so that the loop 1 million times, but the time complexity is too high, There is no estimate that an hour is uncertain.

(2) and then think of the SQL statement preprocessing, so to a large extent improve the efficiency. The following is a central part of this code.

public static void InsertData () {try {System.out.println ("Start insert data"); Long begintime = System.currenttimemillis (); Conn.setautocommit (FALSE); PreparedStatement PST = conn. preparestatement ("INSERT into Test (COL1,COL2,COL3) VALUES (?,?,?)"); for (int i = 1; I <= 1000000 i++) {pst.setstring (1, col1values[(int) (Math.random () *)] pst.setstring (2, Col2val ues[(int) (Math.random () * 3125)]); Pst.setint (3, (int) (Math.random () * 20) + 1); Pst.execute (); } conn.commit (); Pst.close (); Long endtime = System.currenttimemillis (); SYSTEM.OUT.PRINTLN ("End insert data"); System.out.println ("Insert Time:" + (Double) (endtime-begintime)/1000 + "s"); System.out.println (); catch (SQLException CE) {System.out.println (CE);}

The test results are as follows:

s Tart Insert Data
End Insert Data
Insert time:110.215 S

(3) for the above results are still not very satisfied, so began to explore.

(a) See a method on the web, using the Addbatch (), ExecuteBatch () method on the PreparedStatement class, which can be used to batch optimize 1000 or even 10,000 SQL inserts as a transaction at a one-time, by batch processing, And the author has tested the time on Oracle's database for less than 10s. So I also tried a bit, found that is still around 107s, so confused.

(b) This time to see another article on the web, explains why MySQL's JDBC driver does not support bulk operations, the original MySQL does not support Addbatch (), ExecuteBatch () and other methods of batch optimization, and Oracle database support, And can insert 1 million records at the time of 360 ms

URL: http://elf8848.iteye.com/blog/770032

(c) After seeing the log of Mr Ge, he used python to insert 1 million data in SQLite for only 4 seconds, because Python optimized all 1 million INSERT statements and put all inserts into the same transaction. This greatly reduces the time to open and cancel transactions, and it is this part of the operation that consumes a lot of time.

URL: http://aegiryy.net/?p=380

(d) So I was inspired and I learned that when it comes to MySQL database operations, you can insert multiple rows of data into one SQL INSERT statement. So I tried to construct a larger SQL statement by StringBuffer, and each statement could insert 10,000 rows of data (if 100,000 or 1 million would exceed the heap memory limit), so the loop could be done 100 times. The following is the core code for this method:

public static void InsertData () {try {System.out.println ("Start insert data"); Long begintime = System.currenttimemillis (); Statement st = Conn.createstatement (); for (int i = 0; i < i++) {StringBuffer sqlbuffer = new StringBuffer ("INSERT into Test (COL1,COL2,COL3) values"); Sqlbuffer.append ("(/" + col1values[(int) (Math.random () *)] + "/",/"" + col2values[(int) (Math.random () * 3125)] + " /"," + (int) (Math.random () * 20) + 1) + ")"); for (int j = 2; J <= 10000 + +) {sqlbuffer.append (", (/" + col1values[(int) (Math.random () *)] + "/",/"" + Col2va lues[(int) (Math.random () * 3125)] + "/", "+ ((int) (Math.random () * 20) + 1) +"); } sqlbuffer.append (";"); String sql = new string (sqlbuffer); St.execute (SQL); Long endtime = System.currenttimemillis (); SYSTEM.OUT.PRINTLN ("End insert data"); System.out.println ("Insert Time:" + (Double) (endtime-begintime)/1000 + "s"); System.out.println (); catch (SQLException CE) {System.out.println (CE);}

The test results are as follows:

Start Insert Data
End Insert Data
Insert time:15.083 s

(e) Finally, I thought of the optimization of this method, the use of preprocessing, in the code readability and efficiency are improved, although the efficiency is not much improved. Here is the core code for this method:

public static void InsertData () {try {Conn.setautocommit (false); StringBuffer sqlbuffer = new StringBuffer ("INSERT into Test (COL1,COL2,COL3) values"); Sqlbuffer.append ("(?,?,?)"); for (int j = 2; J <= 10000 + +) {sqlbuffer.append (", (?,?,?)");} sqlbuffer.append (";"); String sql = new string (sqlbuffer); PreparedStatement PST = conn.preparestatement (SQL); System.out.println ("Start insert data"); Long begintime = System.currenttimemillis (); for (int i = 0; i < n i++) {for (int j = 0; J < 10000; J + +) {pst.setstring (3 * j + 1, col1values[(int) (Math.ra Ndom () * 10)]); Pst.setstring (3 * j + 2, col2values[(int) (Math.random () * 3125)]); Pst.setint (3 * j + 3, (int) (Math.random () * 20) + 1); } pst.execute (); } conn.commit (); Pst.close (); Long endtime = System.currenttimemillis (); SYSTEM.OUT.PRINTLN ("End insert data"); System.out.println ("Insert Time:" + (Double) (endtime-begintime)/1000 + "s"); System.out.println (); catch (SQLException CE) {System.out.println (CE);} }

The test results are as follows:

Start Insert Data
End Insert Data
Insert time:14.47 s

Finally, all the code for the Final solution is posted:

Package Godfrey.nju; Import java.sql.Connection; Import Java.sql.DriverManager; Import java.sql.PreparedStatement; Import Java.sql.ResultSet; Import java.sql.SQLException; Import java.sql.Statement; public class TestDB2 {private static string dbclassname = "Com.mysql.jdbc.Driver"; private static string Dburl = "Jdbc:my Sql://localhost:3306/db_test "; private static String Dbuser = "root"; private static String dbpwd = "123"; private static Connection conn = null; private static string[] Col1values = {"Mike", "Bob", "Jack", "Alice", "Cathy", "Ann", "Betty", "Cindy", "Mary", "Jane"}; private static string[] Col2values = new string[3125]; private static int point; public static void Main (String args[]) {insertdata ();//Query1 ();//ClearData ();} public static void InsertData () {TR Y {Conn.setautocommit (false); StringBuffer sqlbuffer = new StringBuffer ("INSERT into Test (COL1,COL2,COL3) values"); Sqlbuffer.append ("(?,?,?)"); for (int j = 2; J <= 10000 + +) {sqlbuffer.append (), (?,?,?)"); } Sqlbuffer.append (";"); String sql = new string (sqlbuffer); PreparedStatement PST = conn.preparestatement (SQL); System.out.println ("Start insert data"); Long begintime = System.currenttimemillis (); for (int i = 0; i < n i++) {for (int j = 0; J < 10000; J + +) {pst.setstring (3 * j + 1, col1values[(int) (Math.ra Ndom () * 10)]); Pst.setstring (3 * j + 2, col2values[(int) (Math.random () * 3125)]); Pst.setint (3 * j + 3, (int) (Math.random () * 20) + 1); } pst.execute (); } conn.commit (); Pst.close (); Long endtime = System.currenttimemillis (); SYSTEM.OUT.PRINTLN ("End insert data"); System.out.println ("Insert Time:" + (Double) (endtime-begintime)/1000 + "s"); System.out.println (); catch (SQLException CE) {System.out.println (CE);} public static void Query1 () {try {System.out. println (' Start Query1: ' Select COUNT (*) from test GROUP by col1 ORDER by C Ount (*); ' "); Long begintime = System.currenttimemillis (); Statement st = Conn.createstatement (); String sql = "Select"Count (*) from test GROUP by col1 ORDER by Count (*); "; ResultSet rs = st.executequery (SQL); Long endtime = System.currenttimemillis (); SYSTEM.OUT.PRINTLN ("Result:"); while (Rs.next ()) {System.out.println (Rs.getint (1));} System.out.println ("Query1 Time:" + (Double) (endtime-begintime)/1000 + "s"); St.close (); Conn.close (); catch (Exception e) {e.printstacktrace ();}} public static void ClearData () {try {System.out.println (' Start delete all data '); Long begintime = System.currenttimemillis (); Statement st = Conn.createstatement (); String sql = "Delete from test"; St.execute (SQL); St.close (); Conn.close (); Long endtime = System.currenttimemillis (); SYSTEM.OUT.PRINTLN ("End Delete all data"); System.out.println ("Delete time:" + (Double) (endtime-begintime)/1000 + "s");} catch (Exception e) {e.printstacktrace ();}} static {try {Class.forName (dbclassname). newinstance (); conn = Drivermanager.getconnection (Dburl, Dbuser, dbpwd); CATC H (Exception e) {e.printstacktrace (); Point = 0; Initcol2value (5, New StringBuffer (""));} private static void Initcol2value (int n, stringbuffer str) {if (n = = 0) {col2values[point++] = new String (str); for (int i = 0; i < 5; i++) {StringBuffer strtemp = new StringBuffer (str); Initcol2value (N-1, Strtemp.append (char ) (' A ' + i)); } } }

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.