The most common way to satisfy a GROUP BY clause is to scan the entire table and create a new temporary table in which all rows of each group should be contiguous, and then use the temporary table to locate the group and apply the cumulative function (if any). In some cases, MySQL can do better by indexing access without creating a temporary table.
The most important prerequisite for using an index for group by is that all the group by columns reference the properties of the same index, and the index saves its keywords sequentially. The use of index access instead of temporary tables also depends on which part of the index is used in the query, the conditions specified for that section, and the cumulative function that is selected.
Because group by actually does the same sort operation, group by is mostly just a grouping operation after sorting. Of course, if other aggregate functions are used when grouping, then some aggregation function calculations are also required. As a result, the index can also be used as an order by in the implementation of Group by. In MySQL, the implementation of group by is also available in several (three) ways, with two ways to use existing index information to complete group by, and the other to be used in scenarios where the index is completely unusable. Below we do an analysis of these three implementations.
1. Use loose index Scan (Loose Index Scan) to implement GROUP by
The definition of "loose index Scan", I read a lot of online introduction, are not very clear. This logos is listed as follows:
Definition 1: Loosely indexed scans, in effect, when MySQL fully utilizes the index scan to implement GROUP by, it does not need to scan all the key to meet the criteria to complete the operation results.
Definition 2: The most effective way to optimize group by IS to use the index directly to get the field that requires the group completely. When using this access method, MySQL uses the type of index that the keyword sorts (for example, the Btree index). This allows the fields in the index to be used for group not to fully cover the key in the Where Condition index. Because it contains only a portion of the keywords in the index, it is called a loose index scan.
The field that is used for group in the index does not necessarily contain all the fields of the multiple-column index. For example, there is an index idx (C1,C2,C3), so C1 or C1 and C2 are only part of the index IDX for group by C1 and group by C1,C2. It is important to note that the fields used in the index for group must conform to the "leftmost prefix" principle of the index. The group by C1,C3 is not scanned with a loose index
For example:
Explain
SELECT group_id,gmt_create
From Group_message
WHERE user_id>1
GROUP by Group_id,gmt_create;
I understand the example of "Definition 2" to illustrate
There is an index idx (C1,C2,C3)
SELECT C1, c2 from T1 WHERE C1 < const GROUP by C1, C2;
Fields used in the index for group C1,C2
You do not have to fully cover the key in the Where Condition index (in the Where Condition index, that is, the c1;c1 corresponding key, that is IDX)
The fields used in the index for group (C1,C2) contain only part of the keyword (C1,C2,C3) in the index, so they are called loose index scans.
To use a loose index scan to implement group by, you need to meet at least the following conditions:
Query for a single table
The GROUP by condition field must be in a contiguous position at the front of the same index;
Group by includes the 1th contiguous portion of the index (if the query has a distinct clause for GROUP BY, all distinct properties point to the beginning of the index).
When you use GROUP BY, if you have aggregate functions, you can use only the two aggregate functions, MAX and MIN, and they all point to the same column.
If the reference (in the Where condition) is to a field condition other than the group by condition in the index, it must exist as a constant, but the min () or Max () function has an exception to the argument;
Or, any other part of the index (other than those referenced in the query) must be a constant (that is, they must be referenced by the number of constants), but the min () or Max () function has an exception to the arguments.
Add: If there is a where statement in SQL, and the select contains field conditions outside the group by condition in the index, the fields in the where are to be in constant form.
If there is a where condition in the query, the condition must be an index and cannot contain a field that is not indexed
Loose Index Scan
Explain
SELECT group_id,user_id
From Group_message
WHERE group_id between 1 and 4
GROUP by group_id,user_id;
Loose Index Scan
Explain
SELECT group_id,user_id
From Group_message
WHERE User_id>1 and Group_id=1
GROUP by group_id,user_id;
Non-loosely indexed scan
Explain
SELECT group_id,user_id
From Group_message
WHERE abc=1
GROUP by group_id,user_id;
Non-loosely indexed scan
Explain
SELECT group_id,user_id
From Group_message
WHERE User_id>1 and Abc=1
GROUP by group_id,user_id;
Loose index scans, where the explain output of such queries shows the using index for group-by of the extra column
The following query provides several examples of the class, assuming that the table T1 (C1,C2,C3,C4) has an index idx (C1,C2,C3):
SELECT C1, c2 from T1 GROUP by C1, C2;
SELECT DISTINCT C1, c2 from T1;
SELECT C1, MIN (C2) from T1 GROUP by C1;
SELECT C1, c2 from T1 WHERE C1 < const GROUP by C1, C2;
SELECT MAX (C3), MIN (C3), C1, c2 from T1 where C2 > Const GROUP by c1, C2;
SELECT C2 from T1 WHERE C1 < const GROUP by C1, C2;
SELECT C1, c2 from T1 WHERE c3 = Const GROUP by c1, C2;
For these reasons, you cannot use the Quick selection method to execute the following query:
1, in addition to Min () or Max (), there are other cumulative functions, such as:
SELECT C1, SUM (C2) from the T1 GROUP by C1;
2. The field in the GROUP by clause does not refer to the beginning of the index, as follows:
SELECT c1,c2 from T1 GROUP by C2, C3;
3. The query references a part of the keyword that follows the group by section, and there is no equation equal to the constant, for example:
SELECT c1,c3 from T1 GROUP by C1, C2;
In this example, the reference is to C3 (C3 must be one of the combined indexes) because there is no C3 in group by. And there is no equation equal to the constant. So you can't use loose index scans
You can change this: SELECT c1,c3 from T1 where c3= ' a ' GROUP by C1, C2
The following example cannot use a loose index scan
SELECT c1,c3 from T1 where c3= ' a ' GROUP by C1, C2
Why is loose index scanning highly efficient?
A: Because in the absence of a WHERE clause, which means that a full index scan is required, the number of key values that a loose index scan needs to read is as large as the number of grouped groups, that is, a lot less than the number of key values that actually exist. When the WHERE clause contains a range or equivalent expression, a loose index scan looks for the 1th keyword for each group that satisfies the range criteria, and reads the minimum number of keywords again.
2. Use compact index Scan (tight index scan) to implement GROUP by
The difference between a compact index scan and a GROUP by and a loose index scan is mainly:
A compact index scan needs to read all the key keys that meet the criteria while scanning the index, and then complete the GROUP by operation based on the data being read.
There is no "Using index for Group-by" in the Extra information of the execution plan at this time, but not that the MySQL group by operation is not done through the index, but only if you need access to all the key information that is qualified by the WHERE condition. Fruit. This is the implementation plan output information for GROUP by through a compact index scan.
In MySQL, MySQL Query Optimizer first chooses to attempt a group by operation through a loose index scan, and attempts to achieve it through a compact index scan when it is found that certain situations cannot satisfy the requirements of group by with loose index scans.
The MySQL Query Optimizer cannot use a loose index scan when the GROUP by condition field is not contiguous or is not part of the index prefix.
Then check to see if the conditional field in the where has the prefix portion of the index, if there is an earlier prefix, and the part is a constant, and it is combined with the field after group by to become a contiguous index. Then press the compact Index scan.
SELECT Max (gmt_create)
From Group_message
WHERE group_id = 2
GROUP by user_id
You need to read all the data from the group_id=2 and then complete the group by operation in the read data. (The Group By field is not a contiguous index, where group_id exactly compensates for the missing key and is exactly a constant, so use a compact index scan)
GROUP_ID user_id This order is that the index can be used. If the order of the connections does not conform to the "leftmost prefix" principle of the index, a compact index scan is not used.
The following example uses a compact index scan
There is a gap in GROUP by, but has been overwritten by the condition user_id = 1.
Explain
SELECT group_id,gmt_create
From Group_message
WHERE user_id = 1 GROUP by group_id,gmt_create
GROUP by does not start with the 1th element of the keyword, but there is a condition that provides a constant for that element
Explain
SELECT group_id,gmt_create
From Group_message
WHERE group_id = 1 GROUP BY user_id,gmt_create
The following examples do not use a compact index scan
The user_id,gmt_create connection does not conform to the index "leftmost prefix" principle
Explain
SELECT group_id,gmt_create
From Group_message
WHERE user_id = 1 GROUP by gmt_create
The group_id,gmt_create connection does not conform to the index "leftmost prefix" principle
Explain
SELECT gmt_create
From Group_message
WHERE group_id=1 GROUP by Gmt_create;
3. Use temporary table to implement GROUP by
MySQL Query Optimizer found that just because the index scan did not directly result in the group by, he had to choose to implement the group by using a temporary table and then sorting. This is the case in such an example. GROUP_ID is not a constant condition, but a range, and the GROUP by field is user_id. As a result, MySQL cannot help the group by implementation based on the order of the indexes, only to get the required data through the index range, and then save the data in a temporary table before sorting and grouping operations to complete group by.
Explain
SELECT group_id
from Group_message
WHERE group_id between 1 and 4
Group by USER_ID;
Sample Database File
------------------------------------------------------------host:127.0.0.1--Server version:5.1. 57-community-mysql Community Server (GPL)--Server Os:win32--heidisql version:7.0.0.4156--Date/tim e:2012-08-20 16:52:10----------------------------------------------------------/*!40101 SET @OLD_CHARACTER_S
et_client=@ @CHARACTER_SET_CLIENT * *;
/*!40101 SET NAMES UTF8 * *;
/*!40014 SET foreign_key_checks=0 * *;
--Dumping structure for table Test.group_message DROP table IF EXISTS ' group_message '; CREATE TABLE IF not EXISTS ' group_message ' (' id ' int (a) unsigned not NULL auto_increment, ' group_id ' int (a) unsigned Default NULL, ' user_id ' int (a) unsigned default null, ' Gmt_create ' timestamp not NULL default Current_timestamp, ' abc '
Int (one) not NULL DEFAULT ' 0 ', PRIMARY key (' ID '), key ' Group_id_user_id_gmt_create ' (' group_id ', ' user_id ', ' gmt_create ')
) Engine=myisam auto_increment=27 DEFAULT Charset=utf8; --Dumping data foR table test.group_message:0 rows DELETE from ' group_message ';
/*!40000 ALTER TABLE ' group_message ' DISABLE KEYS * * *; INSERT into ' group_message ' (' id ', ' group_id ', ' user_id ', ' gmt_create ', ' abc ') VALUES (1, 1, 1, ' 2012-08-20 09:25:35 ', 1) , (2, 2, 1, ' 2012-08-20 09:25:39 ', 1), (3, 2, 2, ' 2012-08-20 09:25:47 ', 1), (4, 3, 1, ' 2012-08-20 09:25:50 ', 2), (5, 3 , 2, ' 2012-08-20 09:25:52 ', 2), (6, 3, 3, ' 2012-08-20 09:25:54 ', 0), (7, 4, 1, ' 2012-08-20 09:25:57 ', 0), (8, 4, 2, ' 20 12-08-20 09:26:00 ', 0), (9, 4, 3, ' 2012-08-20 09:26:02 ', 0), (10, 4, 4, ' 2012-08-20 09:26:06 ', 0), (11, 5, 1, ' 2012-08- 20 09:26:09 ', 0), (12, 5, 2, ' 2012-08-20 09:26:12 ', 0), (13, 5, 3, ' 2012-08-20 09:26:13 ', 0), (14, 5, 4, ' 2012-08-20 09 : 26:15 ', 0), (15, 5, 5, ' 2012-08-20 09:26:17 ', 0), (16, 6, 1, ' 2012-08-20 09:26:20 ', 0), (17, 7, 1, ' 2012-08-20 09:26:2 3 ', 0), (18, 7, 2, ' 2012-08-20 09:26:28 ', 0), (19, 8, 1, ' 2012-08-20 09:26:32 ', 0), (20, 8, 2, ' 2012-08-20 09:26:35 ', 0 ), (21, 9, 1, ' 2012-08-20 09:26:37 ', 0), (22, 9, 2, ' 2012-08-20 09:26:40 ', 0), (23, 10, 1, ' 2012-08-20 09:26:42 ', 0), (24, 10, 2, ' 2012-08-20 09:2
6:44 ', 0), (25, 10, 3, ' 2012-08-20 09:26:51 ', 0), (26, 11, 1, ' 2012-08-20 09:26:54 ', 0);
/*!40000 ALTER TABLE ' group_message ' ENABLE KEYS * * *;
/*!40014 SET Foreign_key_checks=1 * *; /*!40101 SET character_set_client= @OLD_CHARACTER_SET_CLIENT * *;
References
1, how MySQL optimizes group by
2, detailed MySQL grouped query group by implementation
3, loose index Scan (Loose index Scan)
4, MySQL learning notes