Recently, a deadlock problem was frequently encountered on the database. The problem with performance is
1. There is a query for:
1) A complex select Investigates a large set of data
2) Use transactions to update the state of this set of data
In order to make the lock time shorter, I cut this whole big transaction into several small transactions, that is, each query and update 1W data into each query and update 100 data, but need to query update 100 times, the total data volume or 1W
This creates a problem because the select is a complex query, where 100 select makes the whole operation time rise sharply.
So I think the best thing to do is to query the 1W data with a SELECT, and then update the operation group, for example, each update 1000, if the transaction update fails, the update failed to remove the data from the select result set
2. There is a large number of result set updates, such as 5W, originally my approach is to cut into 10 small transactions, each update 5,000, sequential update
In fact there is no need for transactions here, because I have no consistency and atomicity requirements for them. The benefit of using transactions is increased efficiency because MySQL defaults to autocommit = True, which is equivalent to not having a single statement as a transaction. However, the disadvantage of using a transaction is that the lock holding time of the update is longer, which can lead to deadlock.
The final solution is to use multi-threaded execution at the same time and not for transactions, but with PreparedStatement (batch).
Finally look at my test data, tested 10W of data
1. Use batch to complete a single execution, a total cost of 55s
2. Use multi-threaded concurrent execution, each thread batch quantity is 5000, the thread pool size is CPU core number * 2, a total cost of 11s
3. Do not use batch, but with multi-threaded execution, each thread is responsible for 5000 data insertion, a total of 24s
4. Use transactions, but do not use batch, using multi-threaded execution, each thread responsible for 5000 data, a total cost of 14s
5. Use transactions while using batch, using multiple threads, 5000 data per thread, consuming 8s
The use of transactions is visible, while batch is the most efficient, but the use of transactions increases the holding time of locks, which requires attention