A MySQL 5.7 partition table performance degradation Case Analysis Original: http://www.talkwithtrend.com/Article/216803 preface
It is hoped that this article will enable MySQL5.7.18 users to know the traps existing in the use of partitioned tables and avoid the pit on this version. At the same time, through the sharing of the source code, upgrade the performance of the partition table MySQL5.7.18 the root cause, to the MySQL source enthusiasts to show the implementation of the partition table lock application.
Problem description
In MySQL version 5.7, there are many performance-related improvements. Includes performance improvements related to temporal tables, optimization of connection establishment speed, performance improvements related to replication distribution, and more. Basically do not need to make configuration changes, only need to upgrade to 5.7 version, you can bring a lot of performance improvements.
We are testing the environment, upgrading the database to version 5.7.18, verifying that the MySQL 5.7.18 version meets our expectations. Observation has been running for some time, with development feedback, the database performance is lower than the previous 5.6.21 version. The main performance characteristic is to encounter a more lock timeout situation. Development of additional feedback, performance degradation related tables are partitioned tables. The update goes all the way to the primary key. This feedback has aroused our attention. We made the following attempt:
- The database version is 5.7.18, the partition table is preserved, and performance is degraded.
- The database version is 5.7.18, the table is adjusted to a non-partitioned table, the performance is normal.
- Fallback the version of the database to version 5.6.21, preserve partition table, performance is also normal
Through the above tests, we generally decided that this performance degradation is related to the MySQL5.7 version upgrade.
Problem recurrence
The database table structure of the test environment is much more, and the call relationship is more complex. In order to further analyze and locate the problem, we cobwebs the following simple repro process.
// 创建一个测试分区表t2:
CREATE TABLE `t2`(
`id` INT(11) NOT NULL,
`dt` DATETIME NOT NULL,
`data` VARCHAR(10) DEFAULT NULL,
PRIMARYKEY (`id`,`dt`),
KEY`idx_dt`(`dt`)
) ENGINE=INNODB DEFAULTCHARSET=latin1
/*!50100 PARTITION BY RANGE (to_days(dt))
(PARTITION p20170218 VALUES LESS THAN (736744)ENGINE = InnoDB,
PARTITIONp20170219 VALUES LESS THAN (736745) ENGINE = InnoDB,
PARTITIONpMax VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
// 插入测试数据
INSERT INTO t2 VALUES (1, NOW(), ‘1‘);
INSERT INTO t2 VALUES (2, NOW(), ‘2‘);
INSERT INTO t2 VALUES (3, NOW(), ‘3‘);
// SESSION 1 对id = 1的 记录 做一个更新操作,事务先不提交。
BEGIN;UPDATE t2 SET DATA = ‘12‘ WHERE id = 1;
// SESSION 2 对id = 2 的记录做一个更新。
BEGIN;UPDATE t2 SET DATA = ‘21‘ WHERE id = 2;
At session 2, we found that this update operation has been waiting. The ID is the primary key, as the reason, the record of the primary key ID = 1 is updated without affecting the record update of the primary key ID = 2.
Query Information_schema under Innodb_locks this table. This table is used to record locks that the InnoDB transaction attempted to request but not yet acquired, as well as locks owned by the firm that blocked other transactions. There are two records:
Observing the Innodb_locks table at this point, the transaction id=40021 locks down the 2nd row of page 3rd, causing the transaction id=40022 to fail.
If we roll back the database to version 5.6.21, the above scenario cannot be reproduced.
Further analysis
Based on the information provided in the Innodb_locks table, we know that the problem is that InnoDB locks up inappropriate rows. The table is the memory storage engine. We set breakpoints in the memory storage engine's insertion interface and get the following stack information. Determines that the lock information is written to the Innodb_locks table in the Red box section.
It is confirmed in the function fill_innodb_locks_from_cache that every time a row is written, the data is obtained from the cache object in the following code.
We know that the cache holds the transaction lock information, so we need to find out how the data in the cache is added. Find the function Add_lock_to_cache by searching for the location where the cache object appears in the InnoDB code. After this function sets a breakpoint for debugging, it finds its content consistent with the data that fills the Innodb_locks table. Determine the lock object used by the function, which is what we are looking for.
Troubleshoot where the lock_t type is used. After filtering and debugging, the resulting row lock is added to the transaction list where the lock is found in the function Reclock::lock_add.
The Reclock::lock_add function can introduce the cause of row lock generation. Therefore, by setting the breakpoint on the function, look at the function stack, and in the following stack, navigate to the function of the red box position:
The following code for Partition_helper::handle_ordered_index_scan tracking, according to the analysis of the code, M_part_spec.end_part determines the maximum number of rows to lock, This is why the non-normal row lock is generated.
The final problem boils down to the reason why the M_part_spec.end_part is generated. By troubleshooting End_part where it is used, the initial setting value of the variable before use is eventually located in the Get_partition_set function. As can be seen from the code, each time a single record of the update operation, when the index scan is locked, the number of partitions table the same amount of rows locked. This is the root cause.
Verification Conclusion
According to the previous analysis, each time a single record of the update operation, the number of partitions table the same amount of rows are locked. We tried to verify our findings.
新增如下两条记录:
INSERT INTO t2 VALUES (4, NOW(), ‘4‘);
INSERT INTO t2 VALUES (5, NOW(), ‘5‘);
// SESSION 1 对id = 1的 记录 做一个更新操作,事务先不提交。
BEGIN;UPDATE t2 SET DATA = ‘12‘ WHERE id = 1;
// SESSION 2 现在对id = 4 的记录做一个更新。
BEGIN;UPDATE t2 SET DATA = ‘44‘ WHERE id = 4;
We found that an update to ID = 4 would work as expected. Will not be affected by the update of id = 1. This is because the id=4 record, which exceeds the number of partitions in the test case, is not locked. In real-world applications, partitioned tables will not have as many partitions as 3 in the test case, but dozens of or even hundreds of. This locking result will exacerbate the lock conflict in the update situation, causing the transaction to be in a lock-wait state. As shown, each transaction has an n row lock, so the possibility of these locking records covering each other is greatly improved, resulting in a decrease in concurrency and reduced efficiency.
Conclusion
With the above analysis, we are very sure that this should be a regression version of MySQL 5.7. We have submitted a bug to the open source community. Oracle acknowledgement is a problem that needs further analysis to investigate this bug.
(RPM) A case study of the degraded performance of a MySQL 5.7 partition table