1. Overview
yesterday at a classmate's The MySQL machine found such a problem,MySQL two table to do a left join , the execution plan shows a table using a full table scan, scan the entire table near - All-in-a- row record, large concurrency of such SQL come over the database has become almost unusable , today and everyone to share the reasons for this problem and solutions, hope can help you better learn MySQL database , come together to see it . The MySQL version is official 5.7.12.
2. Problem Recurrence
First, the table structure and table records are as follows:
mysql> Show CREATE TABLE T1\g
*************************** 1. Row ***************************
Table:t1
Create table:create Table ' T1 ' (
' id ' int (one) not NULL auto_increment,
' name ' varchar (DEFAULT NULL),
' code ' varchar DEFAULT NULL,
PRIMARY KEY (' id '),
KEY ' idx_code ' (' Code '),
KEY ' idx_name ' (' name ')
) Engine=innodb auto_increment=6 DEFAULT Charset=utf8
1 row in Set (0.00 sec)
mysql> Show CREATE TABLE T2\g
*************************** 1. Row ***************************
Table:t2
Create table:create Table ' T2 ' (
' id ' int (one) not NULL auto_increment,
' name ' varchar (DEFAULT NULL),
' code ' varchar DEFAULT NULL,
PRIMARY KEY (' id '),
KEY ' idx_code ' (' Code '),
KEY ' idx_name ' (' name ')
) Engine=innodb auto_increment=6 DEFAULT charset=utf8mb4
1 row in Set (0.00 sec)
mysql> SELECT * from T1;
+--+--+ ———————————-+
| ID | name | Code |
+--+--+ ———————————-+
| 1 | AAAA | 0752b0e3c72d4f5c701728db8ea8a3f9 |
| 2 | bbbb | 36d8147db18d55e64c8b5ea8679328b7 |
| 3 | CCCC | dc3bab5197eeb6b315204f0af563c961 |
| 4 | dddd | 1bb4dc313a54e4c0ee04644d2a1fe900 |
| 5 | eeee | f33180d7745079d2dfaaace2fdd74b2a |
+--+--+ ———————————-+
5 rows in Set (0.00 sec)
mysql> select * from T2;
+--+--+ ———————————-+
| ID | name | Code |
+--+--+ ———————————-+
| 1 | AAAA | bca3bc1eb999136d6e6f877d9accc918 |
| 2 | bbbb | 77dd5d07ea1c458afd76c8a6d953cf0a |
| 3 | CCCC | 3AC617D1857444E5383F074C60AF7EFD |
| 4 | dddd | 8a77a32a7e0825f7c8634226105c42e5 |
| 5 | eeee | 0c7fc18b8995e9e31ca774b1312be035 |
+--+--+ ———————————-+
5 rows in Set (0.00 sec)
2 tables The execution plan for the left join IS as follows:
mysql> desc SELECT * from T2 left join T1 on t1.code = t2.code where t2.name = ' dddd ' \g
*************************** 1. Row ***************************
id:1
Select_type:simple
Table:t2
Partitions:null
Type:ref
Possible_keys:idx_name
Key:idx_name
key_len:83
Ref:const
rows:1
filtered:100.00
Extra:null
*************************** 2. Row ***************************
id:1
Select_type:simple
Table:t1
Partitions:null
Type:all
Possible_keys:null
Key:null
Key_len:null
Ref:null
Rows:5
filtered:100.00
extra:using where; Using Join buffer (Block Nested Loop)
2 rows in set, 1 warning (0.01 sec)
as you can see,t2.name = ' dddd ' uses the index, and t1.code = t2.code This association condition is not used to T1.code The above index, at first Scott also baffled, but the machine will not deceive. Scott uses show warnings to view the rewritten execution plan as follows:
mysql> Show warnings;
+-------+------+-------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------- -----------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------- -----------------------------------------------+
| Note | 1003 | /* select#1 */select ' TestDB '. ' T2 '. ' id ' as ' id ', ' testdb '. ' T2 '. ' Name ' as ' name ', ' TestDB '. ' T2 '. ' Code ' as ' Code ', ' TestDB '. ' T1 '. ' id ' as ' id ', ' testdb '. ' t1 '. ' Name ' as ' name ', ' TestDB '. ' T1 '. ' Code ' as ' code ' from ' testdb '. ' T2 ' left join ' Te Stdb '. ' T1 ' on (convert (' TestDB '. ' T1 '. ' Code ' using utf8mb4) = ' testdb '. ' T2 '. ' Code ')) where (' TestDB '. ' T2 '. ' Name ' = ' dddd ') |
+-------+------+-------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------- -----------------------------------------------+
1 row in Set (0.00 sec)
After the discovery of CONVERT (Testdb.t1.code using UTF8MB4) ,Scott discovered that the character set for 2 tables was different. T1 for UTF8,T2 for utf8mb4. But why is the table character set different (the actual field character set is not the same) will cause t1 full table scan? Below to do the analysis.
(1) first T2 left join T1 determines T2 is the driver table, this step is equivalent to executing the SELECT * from t2 where t2.name = ' dddd ', remove The value of the Code field, here is ' 8a77a32a7e0825f7c8634226105c42e5 ';
(2) then take T2 to find the value of code according to the join condition to go T1 inside the search, this step is equivalent to executing the select * from t1 where t1.code = ' 8a77a32a7e0825f7c8634226105c42e5 ' ;
(3) However, the code field taken out of the T2 table in step (1) is utf8mb4 Character Set, and The code inside the T1 table is the UTF8 character set, which requires character set conversion, Character set conversions Follow the principle of small to large, because utf8mb4 is a superset of UTF8, so here UTF8 Convert to utf8mb4, that is , convert t1.code to utf8mb4 character set, after conversion, because t1.code The index above is still the UTF8 character set, so the index is ignored by the execution plan and then T1 Table can only select Full table scan. Worse, if The T2 filter out more than 1 Records , then T1 will be scanned by the whole table multiple times, the performance of the difference can be imagined.
3. Problem Solving
Since the reason is clear, how to solve it? Of course, change the character set, change the T1 and T2 The same or change the T2 to T1 can, here choose to T1 turn into utf8mb4. How do you turn the character set?
Some students will say with ALTER TABLE T1 charset UTF8MB4; But this is wrong, it just changed the default character set of the table, that is, the new field will use the UTF8MB4 , the field that already exists is still UTF8 .
mysql> ALTER TABLE T1 charset UTF8MB4;
Query OK, 0 rows affected (0.01 sec)
records:0 duplicates:0 warnings:0
mysql> Show CREATE TABLE T1\g
*************************** 1. Row ***************************
Table:t1
Create table:create Table ' T1 ' (
' id ' int (one) not NULL auto_increment,
' name ' varchar (CHARACTER) SET UTF8 DEFAULT NULL,
' code ' varchar (CHARACTER) SET UTF8 DEFAULT NULL,
PRIMARY KEY (' id '),
KEY ' idx_code ' (' Code '),
KEY ' idx_name ' (' name ')
) Engine=innodb auto_increment=6 DEFAULT charset=utf8mb4
1 row in Set (0.00 sec)
only with ALTER TABLE T1 convert to CharSet utf8mb4; is the right one.
However, it is important to note thatALTER TABLE's action to change the character set is blocked (with lock = node error) so do not operate during peak business hours, even during low-peak business periods. The operation of large tables is still recommended to Modify the character set online using Pt-online-schema-change.
mysql> ALTER TABLE T1 convert to CharSet utf8mb4, Lock=none;
ERROR 1846 (0a000): Lock=none is not supported. Reason:cannot Change Column type INPLACE. Try lock=gkfx.
mysql> ALTER TABLE T1 convert to CharSet utf8mb4, lock=gkfx;
Query OK, 5 rows affected (0.04 sec)
records:5 duplicates:0 warnings:0
mysql> Show CREATE TABLE T1\g
*************************** 1. Row ***************************
Table:t1
Create table:create Table ' T1 ' (
' id ' int (one) not NULL auto_increment,
' name ' varchar (DEFAULT NULL),
' code ' varchar DEFAULT NULL,
PRIMARY KEY (' id '),
KEY ' idx_code ' (' Code '),
KEY ' idx_name ' (' name ')
) Engine=innodb auto_increment=6 DEFAULT charset=utf8mb4
1 row in Set (0.00 sec)
Now to see the execution plan, you can see that there is no problem.
mysql> desc SELECT * FROM t2 join T1 on t1.code = t2.code where t2.name = ' dddd ' \g
*************************** 1. Row ***************************
id:1
Select_type:simple
Table:t2
Partitions:null
Type:ref
Possible_keys:idx_code,idx_name
Key:idx_name
key_len:83
Ref:const
rows:1
filtered:100.00
extra:using where
*************************** 2. Row ***************************
id:1
Select_type:simple
Table:t1
Partitions:null
Type:ref
Possible_keys:idx_code
Key:idx_code
key_len:203
Ref:testdb.t2.code
rows:1
filtered:100.00
Extra:null
2 rows in set, 1 Warning (0.00 sec)
4. Attention points
(1) Table character set is not the same, may cause join SQL use not to index, cause serious performance problem;
(2)SQL Review before the launch to do a good job in SQL, as far as possible in the same environment as the production environment Review ;
(3) Changing the character set of the ALTER TABLE operation will block write, as far as possible in the business low peak operation, it is recommended to use Pt-online-schema-change ;
(4) The table structure of the character set to be consistent, the release should be good audit work;
(5) If you want to modify the character set of the table in large batches, do the Review work of SQL , and modify the character set of the associated table together.
5. discussion of issues
finally ask a question, assuming now that the T1 and T2 table's character set has not been modified, if the above problem SQL is replaced by the following (that is, the T2 left If the join T1 is replaced with T1 left join T2), will there be an index invalidation problem? Why?
SELECT * from t1 join t2 on t1.code = t2.code where t1.name = ' dddd '
Article Source: Insidemysql
Index invalidation problem caused by different character set of MySQL table field