Java mysql large data volume insertion and streaming read analysis

Last Update:2014-09-05 Source: Internet

Author: User

Tags bulk insert

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Summarize this week to help customers solve the problem of using MySQL driver for report generation operations, with solutions. Because the report logic is generated to read large amounts of data from the database and is processed in memory

Generate a large amount of summary data and then write to the database. The basic process is read-to-process-write.

1 The problem that the read operation begins to encounter is that the amount of SQL query data is basically unreadable when it is large. Start thinking that server side processing is too slow. However, the data can be returned immediately on the console. So in the application

This way to grab the packet, found also sent SQL immediately after the data returned. But the next method of executing resultset does block. Check the document to turn the code the original MySQL driver default behavior is to have the entire result read to

Only in memory to allow the app to read the results. Obviously inconsistent with the expected behavior, the expected behavior is to stream the way the read, when the result is returned from the MYQL server immediately or read processing. This application does not require a lot of memory

To store this result set. Example of the correct streaming read code:

PreparedStatement PS = connection.preparestatement ("SELECT.. From: " ,

Ps.setfetchsize (Integer.min_value); You can also modify the JDBC URL to be set by the defaultfetchsize parameter so that the return result is read by stream.
ResultSet rs =

While
System.out.println (rs.getstring ("FieldName"
}

Code Analysis: The following is MySQL to determine whether to turn on streaming read results method, there are three conditions Forward-only,read-only,fatch size is Integer.min_value

/** * We Only stream result sets when they is forward-only, read-only, and the * Fetch size have been set to Integer.min_v Alue * * @return True if this result set should is streamed row at-a-time, rather * than read all at once. */protected boolean Createstreamingresultset () {    try {        synchronized (checkclosed ()). Getconnectionmutex ()) {            return (This.resultsettype = = Java.sql.ResultSet.TYPE_FORWARD_ONLY)                 & & (This.resultsetconcurrency = = Java.sql.ResultSet.CONCUR_READ_ONLY) && (this.fetchsize = =  Integer.min_value));        }    } Catch (SQLException e) {//We can ' t break the interface, have this being no-op in case of an error is OK return False
         
          ; }}

2 Batch write issues. At the beginning, the application is a one-piece execution insert to write the report results. Writing is also slow. The main reason is that a single write should be applied to a large number of DB
Request response interaction. Each request is a separate transaction submission. In such a large network latency scenario, multiple requests can have a large amount of time spent on the network latency. The second is because each transaction db will
A flush disk operation writes the transaction log, guaranteeing the persistence of the transaction. Disk IO Utilization is not high because each transaction writes only one piece of data, because disk IO is block-by-Chunk, so a lot of data efficiency is written continuously
Better. Therefore, the number of requests and the number of transactions must be changed to the bulk insert method. The following is an example of a BULK insert: and the JDBC connection string must be added Rewritebatchedstatements=true

int batchsize = +; PreparedStatement PS = connection.preparestatement ("INSERT INTO TB1 (C1,C2,C3 ...) VALUES (?,?,?...) " ); for (int i = 0; i < list.size (); i++) {    ps.setxxx (List.get (i). GetC1 ());    PS.SETYYY (List.get (i). GetC2 ());    Ps.setzzz (List.get (i). GetC3 ());    Ps.addbatch ();    if ((i + 1)% BatchSize = = 0) {        ps.executebatch ();    }} if (list.size ()% batchsize! = 0) {    ps.executebatch ();}

The above code example sends a request every 1000 data. Within the MySQL driver, the application side will combine the parameters of multiple addbatch () into a single INSERT statement of multi value to send to the db to execute
such as insert into TB1 (C1,C2,C3) VALUES (v1,v2,v3), (V4,V5,V6), (V7,V8,V9) ...
This can be significantly less than the number of requests per insert. Reduces network latency and disk IO time, thereby increasing the TPS.
。
Code Analysis: From the code can be seen,
1 Rewritebatchedstatements=true,insert is a parameterized statement and is not an insert ... Select or Insert ... on duplicate key update with an Id=last_inse RT_ID (...) Words will be executed
Executebatchedinserts, the way Muti value is

2 rewritebatchedstatements=true statements are parameterized (not Addbatch (SQL)) and the MySQL server version exceeds three in 4.1 statements. Then execute executepreparedbatchasmultistatement
is to pass multiple statements through; separate commits multiple SQL. For example "INSERT into TB1 (C1,C2,C3) VALUES (v1,v2,v3), insert into TB1 (C1,C2,C3) VALUES (v1,v2,v3) ..."

3 The rest of the execution executebatchserially, which is still a strip of processing

public void Addbatch (String sql) throws SQLException {    synchronized (checkclosed (). Getconnectionmutex ()) {        this.batchhasplainstatements = True;        Super. Addbatch (SQL);}    } Public int[] ExecuteBatch () throws SQLException {    //...    if (!this. batchhasplainstatements && this. Connection.getrewritebatchedstatements ()) {if ( Canrewriteasmultivalueinsertatsqllevel ()) {return executebatchedinserts (batchtimeout);} if ( This.connection.versionMeetsMinimum (4, 1, 0) &&!this. Batchhasplainstatements && This.batchedargs! = null && this.batchedArgs.size () > 3/* Cost of option setting Rt-wise */ ) {Return
               
                 executepreparedbatchasmultistatement (batchtimeout); }} return
                 executebatchserially (batchtimeout);//...}

Executebatchedinserts is more efficient than executepreparedbatchasmultistatement, because one request repeats only one previous insert table (C1,C2,C3)

MySQL server has a limit on the maximum length of request messages, and if batch size is too large to cause request messages to exceed the maximum limit, the MySQL driver is internally divided into multiple messages according to the maximum message limit. So to really reduce the number of commits

Also check the max_allowed_packet of MySQL server otherwise batch size is useless.

There are two ways to verify that MySQL sent the correct SQL

1 Grab packet, is wireshark in the application end of the packet to catch the MySQL message

2 Another option is to open the general log on the MySQL server side to view all SQL received by MySQL

3 Add the parameter traceprotocol=true to the JDBC URL

Performance Test Comparison

ImportJava.sql.connection;importJava.sql.preparedstatement;importJava.sql.sqlexception;importCom.alibaba.druid.pool.druiddatasource;public classBatchinsert {public static void main (string[] args) throwsSQLException {int batchsize = 1000; int insertcount = 1000; Testdefault (BatchSize, Insertcount); Testrewritebatchedstatements (Batchsize,insertcount); } private static void Testdefault (int batchsize, int insertcount) throwsSQLException {Long start =System.currenttimemillis (); Dobatchedinsert (BatchSize, Insertcount, ""); Long end =System.currenttimemillis (); SYSTEM.OUT.PRINTLN ("Default:" + (End-start) + "MS"); } private static void testrewritebatchedstatements (int batchsize, int insertcount) throwsSQLException {Long start =System.currenttimemillis (); Dobatchedinsert (BatchSize, Insertcount, "Rewritebatchedstatements=true"); Long end =System.currenttimemillis (); System.out.println ("rewritebatchedstatements:" + (End-start) + "MS"); } private static void Dobatchedinsert (int batchsize, int insertcount, String mysqlproperties) throws SQLException {Dr Uiddatasource DataSource = new Druiddatasource (); Datasource.seturl ("Jdbc:mysql://ip:3306/test?" +  Mysqlproperties); Datasource.setusername ("name"); Datasource.setpassword ("Password"); Datasource.init (); Connection Connection = datasource.getconnection (); PreparedStatement PreparedStatement = connection.preparestatement ("INSERT into Test (name,gmt_created,gmt_modified) VALUES (?, now (), now ()) "); for (int i = 0; i < Insertcount; i++) {preparedstatement.setstring (1, i+ ""); Preparedstatement.addbatch (); if ((i+1)% BatchSize = = 0) {preparedstatement.executebatch ();}} preparedstatement.executebatch (); Connection.close (); Datasource.close (); }}

The network environment Ping test delay is 35ms, test results:

Default:75525ms
Rewritebatchedstatements:914ms

Java mysql large data volume insertion and streaming read analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More