As we all know, InnoDB uses IoT (Index organization table), the so-called Index organization table, and the leaf node holds all the data, which means that the data is always stored in some order. So the question is, if it is such a statement, what should be done? Statement as follows:
Select COUNT (Distinct a) from table1;
There is an index on column A, so how do you scan it in terms of simple ideas? Very simple, one by one scan, in fact, did an index full scan, the efficiency is very poor. This scanning method scans many, many duplicate indexes, so the optimization approach is easy to imagine: skipping duplicate indexes. So an optimized approach can be found on the Web:
Select COUNT (*) from (select Distinct a from table1) t;
From the data that has been searched, the extra in such execution plan becomes the using index for group-by from using index.
But, but, fortunately, we are now not using the 5.1 version, we are basically more than 5.5, these modern versions, has achieved the loose index scan:
Very well, no need to use this Chine to optimize SQL.
The document about group by here is somewhat interesting, said that the most popular way is to do a full table scan and create a temporary table, so that the execution plan will be ugly, must have all and using temporary table.
After 5.0, group by may be used to loose index scan under certain conditions,
CREATE TABLE log_table (
ID INT not null PRIMARY KEY,
log_machine VARCHAR not NULL,
log_time DATETIME not NULL
) Engine=innodb DEFAULT Charset=utf8;
CREATE INDEX ix_log_machine_time on log_table (Log_machine, log_time);
1
Select Max (log_time) from log_table,
select Max (log_time) from Log_table WHERE log_machine in (' Machine 1 ');
These two SQL can be returned only once by the index seek, from the ordered ordering of the indexes, the optimizer realizes that the Min/max is located at the left/right block, thus avoiding the range scanning;
Extra displays select tables optimized away;
2
Copy Code code as follows:
SELECT MAX (log_time) from Log_table WHERE log_machine in (' Machine 1 ', ' Machine 2 ', ' Machine 3 ', ' Machine 4 ');
The execution plan type is range (extra displays the using where; using index), that is, performing an index range scan, reading all the records that satisfy the log_machine constraint, and then identifying max value by traversing it;
Improved
Copy Code code as follows:
SELECT MAX (log_time) from Log_table WHERE log_machine to (' Machine 1 ', ' Machine 2 ', ' Machine 3 ', ' Machine 4 ') GROUP by log_m Achine ORDER BY 1 desc LIMIT 1;
This satisfies the requirements of the group By Selection Loose index scan, the execution plan extra displays the using index for group-by, and the execution effect is equivalent to
Select Max (log_time) from Log_table WHERE log_machine to (' Machine 1 ')
Union
select Max (log_time) from Log_table W Here Log_machine in (' Machine 2 ') ...
That is, the execution of loose index scan,rows for each log_machine drops from the original 82636 to 16 (a total of 1,000,000 records for the table).
Group by when to use loose index scan?
Applicable conditions:
1 for a single table operation
2 Group by using the leftmost prefix column of the index
3 only supports the aggregate function min ()/max ()
4 The columns that appear in the Where condition must be =constant operations, and the indexed columns that do not appear in group by must use the constant
5 does not support a prefix index, that is, a partial column index, such as index (C1 (10))
The extra of the execution plan should display the using index for group-by
Assume that the table T1 has an index idx (C1,C2,C3)
SELECT C1, c2 from T1 GROUP by C1, C2;
SELECT DISTINCT C1, c2 from T1;
SELECT C1, MIN (C2) from T1 GROUP by C1;
SELECT C1, c2 from T1 WHERE C1 < const GROUP by C1, C2;
SELECT MAX (C3), MIN (C3), C1, c2 from T1 where C2 > Const GROUP by c1, C2;
SELECT C2 from T1 WHERE C1 < const GROUP by C1, C2;
Select C1, c2 from T1 WHERE c3 = Const GROUP by c1, C2
Select C1, C3 to T1 GROUP by C1, c2;--cannot use loose indexing
The Select C1, C3 from T1 where c3= the const GROUP by C1, C2;
Compact Index Scan Tight index scan
Group by in the inability to use loose index scan, you can also choose tight, if both are not optional, you can only use temporary tables;
When the index is scanned, all index keys that satisfy the condition must be read, either a full index scan or a range index scan;
The index column of Group by is discontinuous, or it does not start with the leftmost prefix, but the left column appears in the Where condition;
SELECT C1, C2, C3 from t1 WHERE c2 = ' A ' GROUP by C1, C3;
SELECT C1, C2, C3 from T1 WHERE c1 = ' A ' GROUP by C2, C3;
5.6 of improvements
In fact, 5.6 of the index condition push down can compensate for the loss of performance of loose index scan loss.
KEY (Age,zip)
Mysql> explain SELECT name from people WHERE age BETWEEN to and zip in (12345,12346, 12347);
+----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+
| id | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |
+----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+
| 1 | Simple | people | range | Age | age | 4 | NULL | 90556 | Using where |
+----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+
1 row In Set (0.01 sec)
According to Key_len=4, it is possible to infer that SQL uses only the first column of the index, that is, to identify the row records that satisfy the Age (18,20) by index, and then filter the rows that satisfy the zip constraint from the server layer;
pre-5.6, for a composite index, only when the boot column uses "=" Does it have an opportunity to use the following indexed column in the index scan.
Mysql> explain SELECT name from people WHERE age=18 and zip in (12345,12346, 12347);
+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
| id | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |
+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
| 1 | Simple | people | range | Age | age | 8 | NULL | 3 | Using where |
+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
1 row in Set (0.00 sec)
Compare query efficiency
Mysql> SELECT Sql_no_cache name from people WHERE age=19 and zip in (12345,12346, 12347);
+----------------------------------+
| name |
+----------------------------------+
| 888ba838661aff00bbbce114a2a22423 |
+----------------------------------+
1 row in Set (0.06 sec)
mysql> SELECT sql_no_cache name from people WHERE Age BETWEEN and zip in (12345,12346, 12347);
+----------------------------------+
| name |
+----------------------------------+
| ed4481336eb9adca222fd404fa15658e |
| 888ba838661aff00bbbce114a2a22423 |
+----------------------------------+
2 rows in Set (1 min 56.09 sec)
For the second SQL, you can rewrite with union,
Mysql> select name from people WHERE age=18 and zip in (12345,12346, 12347)
-> UNION all
-> SELECT name F ROM people where age=19 and zip in (12345,12346, 12347)
-> UNION all
-> SELECT name from people WHERE age=20 and zip in (12345,12346, 12347)
-> UNION all
-> SELECT name to people WHERE age=21 and zip in (12345,1234 6, 12347)
-> UNION all
-> SELECT name from people WHERE age=22 and zip in (12345,12346, 12347);
The mysql5.6 introduces the index condition Pushdown and solves such problems from the optimizer level.