Background of Appearance:
The company does a Redis-related project where MySQL stores a lot of statistics. such as client-reported data, redis instance data, application data, machine data and so on. Every day in the report, collection, due to the deletion of the regular, data accumulation. About a year or so of data, a table of data has reached billions of levels. In this case, the data for a table is at least dozens of GB. Therefore, it is necessary to delete the outdated data and temporarily retain the statistics for nearly three months.
Solution:
Basic each table has a field called create_time or Collect_time field, just delete this field three months before the data is OK
Delete from table_name where Create_time < ' 2017-04-06 '
Just execute this sentence of SQL should be able to
Problems encountered:
The total number of locks exceeds the lock table size in MySQL
Because the data needs to be deleted is too large, MySQL to the buffer seems to be only about 8MB (online search)
Find the DBA to help look, ask this table built index no
Show index from TABLE_NAME
By looking at the index, we are indexed on Create_time and Collect_time, and the index type is BTREE,ASC. The MySQL engine we're using here is InnoDB.
Delete from table_name where Create_time < ' 2017-07-06 ' ORDER by create_time ASC limit 10000
Next, I want to use order by + limit to implement the deletion, or the above error occurred
After the dba prompted me to say, why not use ID Delete, say by ID Delete, speed and by index column Delete, not an order of magnitude
Then I thought of splitting it.
Final Solution:
Find the maximum IDs of Create_time and collect_time that match the criteria
Select Max (ID) from table_name where Create_time < ' 2017-04-06 '
The data around here takes about more than 10 seconds.
Then delete by ID, delete 10k at a time, loop delete
Delete from table_name where ID < MAXID limit 10000
Until the expiration time is deleted.
Here I do not have the MSYQL server permissions, removed through the Java Client connection, using the spring JdbcTemplate this interface
Also, one reason to remove 10k here is that the transaction is too large to affect the operation of other services
The technique used is to use the thread pool to perform the SQL deletion and implement the asynchronous deletion. and colleagues at the time of dinner, colleagues also provide a solution, each time the deletion of a second data, so again and again deleted. Look at the data, a second of the data is basically hundreds of thousands of, about, so it is not very good to control the size of the data volume. or through the primary key ID + limit 10k here secure a bit.
Another thing is, in order to be afraid of the MySQL server, here when the thread pool deleted back to sleep (1000), blocking 1s and then delete, reduce the pressure of MySQL server
Today we got a little bit of data removal, and it feels like MySQL is very deep, like the execution of a select count (*), the process of the SELECT from table_name order by ID limit, the index, the various connections, how the engine works. The time to go is still a little out of tune, tomorrow should be able to handle these.
Delete data from MySQL in bulk