Java mysql large data volume batch insertion and stream read Analysis

Source: Internet
Author: User

Java mysql large data volume batch insertion and stream read Analysis
This week, we will help the customer solve some problems and solutions regarding the use of the mysql driver for report generation operations. Since the report generation logic reads a large amount of data from the database and processes it in the memory, a large amount of summary data is generated and then written to the database. The basic process is read-> process-> write. 1. When the SQL query data volume is large, the data cannot be read. At first, it was thought that the processing on the server side was too slow. However, data can be returned immediately on the console. So I captured the packet on the application side and found that data was returned immediately after the SQL statement was sent. However, the next method for executing ResultSet is indeed blocked. The default behavior of mysql driver is to read all the results to the memory to allow the application to read the results. Obviously, it is inconsistent with the expected behavior. The expected behavior is stream-based reading. When the result is returned from the myql server, it is read and processed immediately. In this way, the application does not need a large amount of memory to store this result set. Example of the correct stream reading method code: copy the code PreparedStatement ps = connection. prepareStatement ("select .. from .. ", ResultSet. TYPE_FORWARD_ONLY, ResultSet. CONCUR_READ_ONLY); // forward only read only is also the default value of the mysql driver, so it is also possible to leave it unspecified, such as: PreparedStatement ps = connection. prepareStatement ("select .. from .. "); ps. setFetchSize (Integer. MIN_VALUE); // You can also modify the jdbc url by setting the defaultFetchSize parameter. By default, all returned results are read in stream mode. resultSet rs = ps.exe cuteQuer Y (); while (rs. next () {System. out. println (rs. getString ("fieldName");} copy code analysis: The following is the method for mysql to determine whether to enable stream read results. There are three conditions: forward-only, read-only, fatch size is Integer. MIN_VALUE copy code/*** We only stream result sets when they are forward-only, read-only, and the * fetch size has been set to Integer. MIN_VALUE ** @ return true if this result set shocould be streamed row at-a-time, rather * than read all at once. */p Rotected boolean createStreamingResultSet () {try {synchronized (checkClosed (). getConnectionMutex () {return (this. resultSetType = java. SQL. resultSet. TYPE_FORWARD_ONLY) & (this. resultSetConcurrency = java. SQL. resultSet. CONCUR_READ_ONLY) & (this. fetchSize = Integer. MIN_VALUE) ;}} catch (SQLException e) {// we can't break the interface, having this be no-op in case of error is OK return fa Lse ;}} copy Code 2 for batch writing. At the beginning, the application executes insert one by one to write the report results. Writing speed is also slow. The main reason is that a single write operation must be applied to a large number of Request Response interactions between databases. Each request is a separate transaction commit. In this way, when the network latency is large, multiple requests may cause a large amount of time-consuming network latency. The second reason is that each transaction db will refresh disk operations to write transaction logs to ensure transaction persistence. Since each transaction only writes one piece of data, the disk io utilization is not high, because the disk io is block-based, it is more efficient to write a large amount of data continuously. Therefore, you must change to batch insert to reduce the number of requests and transactions. The following is an example of batch insertion: A jdbc connection string must be added with rewriteBatchedStatements = true to copy the code int batchSize = 1000; PreparedStatement ps = connection. prepareStatement ("insert into tb1 (c1, c2, c3 ...) values (?,?,?...) "); For (int I = 0; I <list. size (); I ++) {ps. setXXX (list. get (I ). getC1 (); ps. setYYY (list. get (I ). getC2 (); ps. setZZZ (list. get (I ). getC3 (); ps. addBatch (); if (I + 1) % batchSize = 0) {ps.exe cuteBatch () ;}} if (list. size () % batchSize! = 0) {ps.exe cuteBatch ();} copy the code. The above sample code sends a request every 1000 pieces of data. The mysql driver combines multiple addBatch () parameters into a multi-value insert Statement on the application end and sends it to the db for execution, such as insert into tb1 (c1, c2, c3) values (v1, v2, v3), (v4, v5, v6), (v7, v8, v9 )... in this way, there are much fewer requests than each insert statement. This reduces network latency consumption time and disk I/O time, thus improving tps .. Code Analysis: From the code, we can see that 1 rewriteBatchedStatements = true, insert is a parameterized statement, not insert... select or insert... on duplicate key update with an id = last_insert_id (...) executeBatchedInserts will be executed, that is, the muti value method 2 rewriteBatchedStatements = true statement is parameterized (not included in the addbatch (SQL) method) in addition, if the mysql server version is later than 4.1, execute executePreparedBatchAsMultiStatement to separate multiple statements and submit multiple SQL statements at a time. For example, "insert into tb1 (c1, c2, c3) values (v1, v2, v3); insert into tb1 (c1, c2, c3) values (v1, v2, v3 )... "3 the remaining execution executeBatchSerially, that is, the replication code public void addBatch (String SQL) throws SQLException {synchronized (checkClosed (). getConnectionMutex () {this. batchHasPlainStatements = true; super. addBatch (SQL) ;}} public int [] executeBatch () throws SQLException {//... if (! This. batchHasPlainStatements & this. connection. getRewriteBatchedStatements () {if (canRewriteAsMultiValueInsertAtSqlLevel () {return executeBatchedInserts (batchTimeout);} if (this. connection. versionMeetsMinimum (4, 1, 0 )&&! This. batchHasPlainStatements & this. batchedArgs! = Null & this. batchedArgs. size ()> 3/* cost of option setting rt-wise */) {return executePreparedBatchAsMultiStatement (batchTimeout) ;}} return executeBatchSerially (batchTimeout );//.....} the copy code executeBatchedInserts has better transmission efficiency than the executePreparedBatchAsMultiStatement method, because the previous insert table (c1, c2, c3) mysql server repeats only once for a single request. The maximum length of the request message is limited, if the batch size is too large, the request packet exceeds the maximum limit. The mysql driver internally splits multiple packets according to the maximum packet limit. Therefore, to reduce the number of submissions, check the max_allowed_packet of the mysql server. Otherwise, the batch size is useless. mysql> show VARIABLES like '% max_allowed_packet % '; + rows + ----------- + | Variable_name | Value | + -------------------- + ----------- + | max_allowed_packet | 167772160 | + -------------------- + ----------- + 1 row in set (0.00 sec) to verify that mysql has sent the correct SQL statement, there are two methods: 1 packet capture. wireshark captures mysql packets on the application side. 2. Another method is to enable general log on the mysql server side. Mysql received all SQL 3 added the traceProtocol parameter to the jdbc url = true performance test comparison copy code import java. SQL. connection; import java. SQL. preparedStatement; import java. SQL. SQLException; import com. alibaba. druid. pool. druidDataSource; public class BatchInsert {public static void main (String [] args) throws SQLException {int batchSize = 1000; int insertCount = 1000; testDefault (batchSize, insertCount); testRewriteBatchedStatements (ba TchSize, insertCount);} private static void testDefault (int batchSize, int insertCount) throws SQLException {long start = System. currentTimeMillis (); doBatchedInsert (batchSize, insertCount, ""); long end = System. currentTimeMillis (); System. out. println ("default:" + (end-start) + "ms");} private static void testRewriteBatchedStatements (int batchSize, int insertCount) throws SQLException {long st Art = System. currentTimeMillis (); doBatchedInsert (batchSize, insertCount, "rewriteBatchedStatements = true"); long end = System. currentTimeMillis (); System. out. println ("rewriteBatchedStatements:" + (end-start) + "ms");} private static void doBatchedInsert (int batchSize, int insertCount, String mysqlProperties) throws SQLException {DruidDataSource dataSource = new DruidDataSource (); dataSource. set Url ("jdbc: mysql: // ip: 3306/test? "+ MysqlProperties); dataSource. setUsername ("name"); dataSource. setPassword ("password"); dataSource. init (); Connection connection = dataSource. getConnection (); PreparedStatement preparedStatement = connection. prepareStatement ("insert into Test (name, gmt_created, gmt_modified) values (?, Now (), now () "); for (int I = 0; I <insertCount; I ++) {preparedStatement. setString (1, I + ""); preparedStatement. addBatch (); if (I + 1) % batchSize = 0) {preparedStatement.exe cuteBatch () ;}} preparedStatement.exe cuteBatch (); connection. close (); dataSource. close () ;}} replication code the network environment ping test latency is 35 ms. Test Result: default: 75525 msrewriteBatchedStatements: 914 ms

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.