Implementation principle of MySQL Group query Group By: mysqlgroup
Because group by actually performs sorting operations, and compared with order by, group by mainly only performs grouping operations after sorting. Of course, if other Aggregate functions are used during grouping, Some Aggregate functions are required for calculation. Therefore, in the implementation of group by, indexes can also be used like order.
In MySQL, the implementation of group by also has multiple (three) methods, two of which use the existing index information to complete group, another scenario is that indexes cannot be used completely. The following is an analysis of the three implementation methods.
1. Use Loose index scanning to implement GROUP
What is loose index scan to implement group? In fact, when MySQL uses index scan to implement group by, it does not need to scan all the index keys that meet the conditions to complete the operation.
In the following example, we use a loose index scan to implement group by. Before this example, we need to adjust the index of the group_message table and add the gmt_create field to the index of the group_id and user_id fields:
sky@localhost: example 08:49:45> create index idx_gid_uid_gc -> on group_message(group_id,user_id,gmt_create); Query OK, rows affected (0.03 sec) Records: 96 Duplicates: 0 Warnings: 0sky@localhost: example 09:07:30> drop index idx_group_message_gid_uid -> on group_message; Query OK, 96 rows affected (0.02 sec) Records: 96 Duplicates: 0 Warnings: 0
Then let's look at the execution plan of the following Query:
sky@localhost: example 09:26:15> EXPLAIN -> SELECT user_id,max(gmt_create) -> FROM group_message -> WHERE group_id < 10 -> GROUP BY group_id,user_id\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: group_message type: range possible_keys: idx_gid_uid_gc key: idx_gid_uid_gc key_len: 8 ref: NULL rows: 4 Extra: Using where; Using index for group-by
We can see that "Using index for group-by" is displayed in the Extra information of the Execution Plan. In fact, this is to tell us, mySQL Query Optimizer uses loose index scanning to implement the group by operation we need.
The image below depicts the approximate Implementation of the scanning process:
To use a loose index scan to implement group by, you must meet at least the following conditions:
◆ The group by condition field must be in the first consecutive position in the same index;
◆ When using group by, only the MAX and MIN Aggregate functions can be used;
◆ If a field condition other than the group by condition in the index is referenced, it must exist as a constant;
Why is loose index scanning very efficient?
Because there is no WHERE clause, that is, when the full index scan is required, the number of key values to be read by the loose index scan is as large as the number of groups in the group, that is to say, it is much less than the actual number of key values. When the WHERE clause contains a range limit or an equivalent expression, the loose index scans 1st keywords in each group that meet the range conditions and reads as few keywords as possible again.
2. Use a compact (Tight) index scan to implement GROUP
The difference between a compact index scan and a loose index scan is that it needs to read all the matching index keys when scanning the index, then, the group by operation is completed based on the read of the evil data to obtain the corresponding result.
sky@localhost : example 08:55:14> EXPLAIN -> SELECT max(gmt_create) -> FROM group_message -> WHERE group_id = 2 -> GROUP BY user_id\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: group_message type: ref possible_keys: idx_group_message_gid_uid,idx_gid_uid_gc key: idx_gid_uid_gc key_len: 4 ref: const rows: 4 Extra: Using where; Using index 1 row in set (0.01 sec)
At this time, the Extra information of the execution plan does not contain "Using index for group-by", but it does not mean that the MySQL group by operation is not completed through the index, the result is obtained only after you access all the index key information specified by the WHERE condition. This is the execution plan output information of group by through compact index scanning.
The following figure shows the overall execution process:
In MySQL, MySQL Query Optimizer first tries to implement the group by operation through loose index scanning. When it is found that some situations cannot meet the requirements of the loose index scan for group, will try to achieve through compact index scanning.
When the group by condition field is not consecutive or is not part of the index prefix, MySQL Query Optimizer cannot use loose index scanning, and the setting cannot directly complete the group by operation through the index, the missing index key information cannot be obtained. However, if the Query statement contains a constant value to reference the missing index key, you can use compact index scan to complete the group by operation, because the constant fills the "gap" in the search keyword ", you can create a complete index prefix. These index prefixes can be used for index search. MySQL can also avoid additional sorting operations if you want to sort group by results and search keywords with index prefixes, because all the keywords have been searched in order by using the prefix of the ordered index.
3. Use a temporary table to implement GROUP
MySQL must store all the group by fields in the same index to make full use of the group by operation, the index is an ordered index (for example, the Hash index cannot meet the requirements ). In addition, it is not just that whether the index can be used to implement group by is also related to the aggregate function used.
The preceding two group by implementation methods are used when available indexes are available. When MySQL Query Optimizer cannot find an appropriate index that can be used, you have to read the required data first, and then use a temporary table to complete the group by operation.
sky@localhost : example 09:02:40> EXPLAIN -> SELECT max(gmt_create) -> FROM group_message -> WHERE group_id > 1 and group_id < 10 -> GROUP BY user_id\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: group_message type: range possible_keys: idx_group_message_gid_uid,idx_gid_uid_gc key: idx_gid_uid_gc key_len: 4 ref: NULL rows: 32 Extra: Using where; Using index; Using temporary; Using filesort
This execution plan clearly tells us that MySQL finds the data we need through the index, creates a temporary table, and sorts the table to obtain the group by result we need. The entire execution process is shown in the following figure:
After MySQL Query Optimizer finds that the result of group by cannot be directly obtained only through index scanning, it has to use a temporary table and then sort it to implement group.
This is the case in this example. Group_id is not a constant condition, but a range, and the group by field is user_id. Therefore, MySQL cannot help the implementation of group by according to the index sequence. It can only scan the index range to obtain the required data, and then store the data into a temporary table, then, sort and GROUP the group by statement.