Analysis of slow SQL queries by a user, reason and description of optimization problems
A user reports that the execution time of the previous SQL statement is too slow to accept. The SQL statement looks simple (the table name and field name are modified in this description): SELECT count (*) FROMaJOINb ONa.'s '= B.'s 'Where. 'L'> '2017-03-30 00:55:00 'AND. 'L' <'2017-03-30 01:00:00 '; the required fields are indexed. the table structure is as follows: create table 'A' ('L' timestamp not null default '2017-01-01 00:00:00 ',' I 'varchar (32) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL, 'A' varchar (32) character set utf8 COLLATE utf8_bin default null, 'S' varchar (64) character set utf8 COL LATE utf8_bin default null, 'F' tinyint (4) default null, 'V' varchar (256) character set utf8 COLLATE utf8_bin DEFAULT '', 'n' varchar (64) character set utf8 COLLATE utf8_bin default null, KEY 'ix _ L' ('L'), KEY 'ix _ I '(' I '), KEY 'ix _ s' ('s ') ENGINE = InnoDB default charset = utf8; create table 'B' ('R' timestamp NOT NULL DEFAULT '2017-01-01 00:00:00 ', 'V' varchar (32) DEFAULT NULL, 'u' varchar (32) default null, 'C' varchar (16) default null, 'S' varchar (64) default null, 'I' varchar (64) DEFAULT NULL, 'E' bigint (32) DEFAULT NULL, 'els' varchar (128) default null, KEY 'ix _ r' ('r'), KEY 'ix _ C' ('C '), KEY 'ix _ s' ('s ') ENGINE = InnoDB default charset = utf8; from the statement perspective, this query plan is natural and should use a as the driving table, a. L and B. s. In fact, the explain result is: + ---- + ------------- + ------- + --------------- + ------ + --------- + ---------- + --------- + ------------- + | id | select_type | table | type | keys | key_len | ref | rows | extra | + ---- + ------------- + ------- + --------------- + ------ + --------- + ---------- + --------- + ------------- + | 1 | SIMPLE | B | index | IX_S | 195 | NULL | 1038165 | Using index | 1 | SIMPLE | a | ref | IX_L, IX_S | 195 | test. b. S | 1 | Using where | + ---- + ------------- + ------- + ------------- + ------ + --------- + ---------- + --------- + ------------- +
Analysis
According to the explain results, B is used as the driving table for the query. In the previous article, we introduced that the jion sequence for MySQL is the method to select the minimum cost after analyzing the costs of various join orders respectively. This join involves only two tables and is naturally independent of optimizer_search_depth. The question is, why is the join order we expected not selected? MySQL Tips: MySQL provides the straight_join syntax to forcibly set the connection sequence. Explain SELECT count (*) FROMastraight_joinb ONa.'s '= B.'s 'Where. 'L'> '2017-03-30 00:55:00 'AND. 'L' <'2017-03-30 01:00:00 '; + ---- + ------------- + ------- + ------------- + ------ + --------- + keys + | id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra | + ---- + ------------- + ------- + --------------- + ------ + --------- + hour + | 1 | SIMPLE | a | range | IX_L, IX_S | IX_L | 4 | NULL | 63 | Using where | 1 | SIMPLE | B | index | IX_S | 195 | NULL | 1038165 | Using where; Using index; using join buffer | + ---- + ------------- + ------- + --------------- + ------ + --------- + ------------------------------------------- +
MySQL Tips: In the explain results, the join query costs can be estimated by sequential concatenation rows.
The join order is correct. The simple analysis query cost is: normal join is 1038165*1, and straight_join is 63*1038165. it seems that MySQL is correct. But there must be something wrong!
Exception detected
Return to our original ideas. We expect Table a as the driving table because we think that Table B can use IX_S indexes, but in fact it is actually used in staight_join, but this result is different from what we expected. We know that the filtering of indexes determines whether an index will be selected in a query. is B. S poor?
MySQL Tips: show index from tbname the Cardinality value in the returned results can indicate the filtering of an index.
The show index results are too many and can be retrieved from the information_schema table. Mysql> select * from information_schema.STATISTICS where table_name = 'B' and index_name = 'ix _ s'/G ****************** * ******** 1. row ************************** TABLE_CATALOG: def TABLE_SCHEMA: testTABLE_NAME: bNON_UNIQUE: 1 INDEX_SCHEMA: testINDEX_NAME: IX_S SEQ_IN_INDEX: 1COLUMN_NAME: s collation: ACARDINALITY: 1038165SUB_PART: null packed: NULLNULLABLE: YESINDEX_TYPE: BTREECOMMENT: INDEX_COMMENT: Yes CARDINALITY: 1038165 for this index, which is already very large. What is the estimated row of this table. Show table status like 'B'/G **************************** 1. row ************************* Name: bEngine: InnoDBVersion: 10Row_format: CompactRows: 1038165 Avg_row_length: 114 Data_length: Weight: 0Index_length: 109953024Data_free: 5242880 Auto_increment: NULL Create_time: 00:24:25 Update_time: NULLCheck_time: NULLCollation: limit Checksum: NULL Create_options: Comment: 1 row in set (0.00 sec) from Rows: 1038165, we can see that the distinction between IX_S indexes is considered very good, and it is already similar to a unique index. MySQL Tips: the Rows displayed in the show table status result indicates the current row number of the table. This is an exact value for the MyISAM table, but an estimate for InnoDB. Although it is an estimate value, the optimizer is guided by this. that is to say, the data in an explain above does not meet the expectation at all: the rows in the second row in the result of staight_join.
So Far
We found that the logic of the entire error is as follows: the execution plan of a-driven table, due to index B. S rows is estimated to be 1038165, which leads the optimizer to think that the cost is greater than that of the drive Table B. In fact, the differentiation of this index is 1. (Of course, the students who are familiar with the explain results find that the type field in the second row is different from the Extra field.) that is to say, when each row obtained by straight_join is queried in B, all go through the full table scan. In MySQL, type conversion is the most common case. For example, a string field contains full numbers, but the input is not in the string format during query. In this case, both are strings. Therefore, it is related to character sets. Return to the two table structures and find that the declaration difference of the S field is COLLATE utf8_bin -- this is the root cause of this case: the S value obtained in Table a is utf8_bin, and the optimizer considers the type to be different, index B cannot be used directly. IX_S filter. As for why indexes are still used, this is because overwriting indexes creates a "misunderstanding ".
MySQL Tips: If all the query results can be obtained from an index, the traversal index will be used instead of the traversal data.
For verification, mysql> explain SELECT * FROMastraight_JOINb ONbinary.'s '= B.'s 'Where. 'L'> '2017-03-30 00:55:00 'AND. 'L' <'2017-03-30 01:00:00 ';
+ -- + ----- + --- + ----- + -- + --- + -------------- +
| Id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ -- + ----- + --- + ----- + -- + --- + -------------- +
| 1 | SIMPLE | a | range | IX_L | 4 | NULL | 63 | Using where |
| 1 | SIMPLE | B | ALL | IX_S | NULL | 1038165 | Range checked for each record (index map: 0x4) |
+ -- + ----- + --- + ----- + -- + --- + -------------- +
Because the result is select * and the index cannot be overwritten, the key in the second row is displayed as NULL ).
Optimization
Of course, the most direct idea is to modify the S field definitions of the two tables and change them to the same one. This method can avoid modifying the business code, but the DDL cost is slightly higher. Here we provide two optimization methods for SQL statements. 1. select count (*) fromb join (select s fromaWHERE. 'L'> '2017-03-30 00:55:00 'AND. 'L' <'2017-03-30 01:00:00 ') ta on B. S = ta. s; this method is more intuitive. pay attention to B. S and ta. S order 2, SELECT count (*) FROMaJOINb ONbinary.'s '= B.'s 'Where. 'L'> '2017-03-30 00:55:00 'AND. 'L' <'2017-03-30 01:00:00 '; from the previous analysis, it is known that B. S is defined as utf8_bin.MySQL Tips: In the MySQL character set naming rules, the difference between XXX_bin and XXX is case sensitive. Here we add binary limitation for all A.S, first convert it to lowercase, that is, convert the temporary result set to utf8_bin, and then use the index directly when B. s is matched. In fact, the two rewriting methods are essentially the same, but the difference is that writing 1 is implicit conversion. In theory, writing 2 is faster.
Summary
The join fields are designed to be of the same type as possible.