Problem description
One user reflects the first line one SQL statement execution time is too slow to accept. The SQL statement looks simple (the table name and field name are modified in the description):
SELECT Count (*) from a JOIN b in a. ' s ' = B. ' WHERE A. ' l ' > ' 2014-03-30 00:55:00 ' and a ' l ' < ' 2014-03-30 01:00:00 ' ;
And the fields that the query requires are indexed, and the table structure is as follows:
CREATE TABLE ' a ' (' L ' timestamp not NULL DEFAULT ' 2000-01-01 00:00:00 ', ' I ' varchar (32) CHARACTER set UTF8 COLLATE utf8_bin default null, ' A ' varchar (+) CHARACTER SET UTF8 COLLATE utf8_bin DEFAULT null, ' S ' VA Rchar ($) CHARACTER set UTF8 COLLATE utf8_bin default null, ' F ' tinyint (4) DEFAULT null, ' V ' varchar (256) CHARACTER set UT F8 COLLATE utf8_bin default ', ' N ' varchar ($) CHARACTER SET UTF8 COLLATE utf8_bin DEFAULT NULL, key ' ix_l ' (' L '), key ' I
X_i ' (' I '), KEY ' ix_s ' (' S ')) Engine=innodb DEFAULT Charset=utf8; CREATE TABLE ' B ' (' R ' timestamp not NULL DEFAULT ' 2000-01-01 00:00:00 ', ' V ' varchar () DEFAULT NULL, ' U ' varchar () DEF Ault null, ' C ' varchar default NULL, ' S ' varchar ($) default null, ' I ' varchar (+) default NULL, ' E ' bigint (s) defaul T null, ' ES ' varchar (128) DEFAULT NULL, key ' Ix_r ' (' R '), key ' Ix_c ' (' C '), key ' ix_s ' (' S ')) Engine=innodb default CHARS
Et=utf8;
From the statement, this query plan is very natural, it should be to use a as a driver table, successively using A.L and B.S two indexes. In fact, explain's results are:
+----+-------------+-------+-------+---------------+------+---------+----------+---------+-------------+
| id | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+----------+---------+-------------+
| 1 | Simple | B | Index | ix_s | ix_s | 195 | NULL | 1038165 | Using Index |
| 1 | Simple | A | Ref | ix_l,ix_s | ix_s | 195 | TEST.B.S | 1 | Using where |
+----+-------------+-------+-------+---------------+------+---------+----------+---------+-------------+
Analysis
From the results of explain, the query uses B as the driving table.
In the previous article, we introduced that the MySQL choice jion order is to analyze the cost of each join order, choose the least cost method.
This join involves only two tables, and naturally has nothing to do with optimizer_search_depth. So our question is, why didn't we expect the join sequence to be chosen?
The MySQL Tips:mysql provides straight_join syntax, which forces the connection order to be set.
Explain SELECT count (*) from a straight_join B on a. ' s ' = B. ' WHERE A. ' l ' > ' 2014-03-30 00:55:00 ' and A. ' L ' < ' 201 4-03-30 01:00:00 ';
+----+-------------+-------+-------+---------------+------+---------+------+---------+------------------------- --------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+---------+------------------------- --------------------+
| 1 | Simple | A | Range | ix_l,ix_s | ix_l | 4 | NULL | 63 | Using where |
| 1 | Simple | B | Index | ix_s | ix_s | 195 | NULL | 1038165 | The Using where; Using index; Using Join Buffer |
+----+-------------+-------+-------+---------------+------+---------+------+---------+------------------------- --------------------+
In the MySQL tips:explain result, the query cost of join can be estimated by rows.
The join sequence is right, simple analysis query cost: Ordinary join IS 1038165*1, Straight_join is 63*1038165. There seems to be nothing wrong with MySQL. But there must be something wrong!
Exception found
Back to our original idea. We expect table A to be the driver table because we think that Table B can be used on ix_s index, and actually staight_join it, but the result is different from what we expected.
We know that the filtering of indexes is an important factor in determining whether an index will be selected in a query, and then is not B. is the filtration of s not good?
The value of cardinality in the MySQL tips:show index from Tbname return results can indicate the filtration of an index.
Show index too many results, can also be taken from the Information_schema table.
Mysql> SELECT * from INFORMATION_SCHEMA. STATISTICS where table_name= ' B ' and index_name= ' ix_s ' \g
*************************** 1 row **********************
table_catalog:def
table_schema:test
table_name:b
non_unique:1
index_schema:test
index_name:ix_s
seq_in_index:1
column_name:s
collation:a
cardinality:1038165
sub_part: NULL
packed:null
nullable:yes
index_type:btree
COMMENT:
index_comment:
Can this index of cardinality:1038165, already very big. So how much is the estimate for this watch?
Show table status like ' B ' \g
*************************** 1 row ***************************
name:b
Engine : InnoDB
version:10
row_format:compact
rows:1038165
avg_row_length:114
data_length: 119160832
max_data_length:0
index_length:109953024
data_free:5242880
auto_increment:null
create_time:2014-05-23 00:24:25
Update_time:null
check_time:null
collation:utf8_general_ci
checksum:null
create_options:
Comment:
1 row in Set (0.00 sec)
From the rows:1038165, the ix_s of this index is considered very good and is already approximate to a unique index.
MySQL Tips: The rows you see in Show Table status results are used to represent the current number of rows in a table. This is an exact value for the MyISAM table, but for InnoDB this is an estimate.
Although it is an estimate, the optimizer is guided by this, that is, the data in one of the above explain is completely inconsistent with expectations: rows in the second row of the staight_join result.
Stage conclusion
We found that the whole logic of the error was this: The execution plan with a as the driver table, due to index B. The rows estimate of S of 1038165 causes the optimizer to assume that the cost is greater than the B-driven table. In fact, the index is 1. (Of course, the Explan results are more familiar to the students will find that the second row of the Type field and the extra field is weird together)
That is to say, every line that Straight_join gets goes to the query in B, all of them go through the whole table scan. The most common occurrence of this situation in MySQL is type conversion. For example, a string field, although it contains a full number, is not in string format when the query is passed.
In this case, two are strings. Therefore, the character set is related.
Back to the two table structure, found that the S-field declaration difference is COLLATE Utf8_bin-this is the root cause of this case: a table to get the S value is Utf8_bin, the optimizer thinks the type is different, can not directly use index B. ix_s Filtration.
As for why the index is still used, this is because of the "misunderstanding" of overwriting the index.
MySQL Tips: If all the results of a query can be fully obtained from an index, then the traversal index will be preferred instead of traversing the data.
As validation,
Mysql> explain SELECT * from a straight_join b on binary a. ' s ' = B. ' WHERE A. ' l ' > ' 2014-03-30 00:55:00 ' and A. ' L ' < ' 2014-03-30 01:00:00 ';
+--+ ————-+ ——-+ ——-+ ————— +--+ ——— +--+ ——— + ———————————————— +
| id | select_type | table | type | possible_keys | key | key_ Len | Ref | Rows | Extra |
+--+ ————-+ ——-+ ——-+ ————— +--+ ——— +--+ ——— + ———————————————— +
| 1 | Simple | A | Range | ix_l | ix_l | 4 | NULL | 63 | Using where |
| 1 | Simple | B | All | ix_s | NULL | NULL | NULL | 1038165 | Range checked for each record (index map:0x4) |
+--+ ————-+ ——-+ ——-+ ————— +--+ ——— +--+ ——— + ———————————————— +
Because the result is a select *, the overlay index cannot be used, so the key for the second row is displayed as null. (The author tears: If early out of this result can be more convenient to find)
Optimization
Of course the most direct idea is to modify the two table of the S-field definition, change to the same. This method avoids modifying the business code, but the DDL cost is slightly greater. Here are two optimizations for SQL statements.
1. Select COUNT (*) from B Join (select S from a WHERE A. ' l ' > ' 2014-03-30 00:55:00 ' and A. ') ' < ' 2014-03-30 01:00:00 ' ) Ta on B.s=ta.s;
This writing is more intuitive, need to pay attention to the last B. Order of S and TA.S
2. SELECT Count (*) from a JOIN b on binary a. ' s ' = B. ' s ' WHERE A. ' l ' > ' 2014-03-30 00:55:00 ' and A. ' L ' < ' 2014-03-30 01:00:00 ';
From the analysis of the preceding know is because of B. S is defined as Utf8_bin.
In the MySQL tips:mysql character set naming rules, the difference between Xxx_bin and XXX is case sensitive.
Here we will add the A.S all binary to the lowercase, which is to convert the temporary result set to Utf8_bin, then use B. The index can be used directly when s matches.
In fact, the essence of the two rewriting methods is the same, and the difference is that writing 1 is an implicit conversion. In theory, 2 is faster.
Summary
The fields that do the join are designed to be of exactly the same type.