SQL GROUP BY underlying principle-essentially sort, you can use the index to pre-order

Source: Internet
Author: User
Tags compact mysql query sorts

Transferred from: http://blog.csdn.net/caomiao2006/article/details/52140993

because group by actually sorts, and the group by is mostly just a sort of grouping operation, compared to order by. of course, if you use some other aggregate functions when grouping, you also need to calculate some aggregate functions. Therefore, in the implementation of group by, the index can be used as well as the ORDER by.

In MySQL, the implementation of group by also has several (three) ways, two of which use existing index information to complete group by, and another for scenarios where the index is completely unusable. Let's do an analysis of these three implementations separately.

1. Using Loose (Loose) index Scan to implement GROUP by

What is loosely indexed scan implementation GROUP by? In fact, when MySQL takes full advantage of the index scan to implement GROUP by, it does not need to scan all the index keys that satisfy the criteria to complete the operation.

In the following example, we describe a loosely indexed scan implementation GROUP by, before the example we need to first adjust the index of the Group_message table and add the Gmt_create field to the index of the group_id and user_id fields:

[Email protected]: example ,: the: $>CREATE INDEX IDX_GID_UID_GC-On group_message (group_id,user_id,gmt_create); Query OK, rows affected (0.03sec) Records: theDuplicates:0Warnings:0[email protected]: example the: -: ->DROP Index Idx_group_message_gid_uid-On group_message; Query OK, theRows Affected (0.02sec) Records: theDuplicates:0Warnings:0

Then look at the following Query execution plan:

[Email protected]: example the: -: the>EXPLAIN-SELECT User_id,max (gmt_create)-From group_messageWHERE group_id <Ten-GROUP by Group_id,user_id\g***************************1. Row ***************************ID:1select_type:simple table:group_message type:range possible_keys:idx_gid_uid_gc key:idx_gid_uid_gc key_len:8ref:null rows:4extra:using where; Using Index forGroup-by

We see information in the execution plan Extra information that shows "using Index for Group-by", which is actually telling us that MySQL Query Optimizer implements what we need by using a loose index scan GROUP by operation.

The following picture depicts the approximate implementation of the scanning process:

To implement GROUP by with a loose index scan, you need to meet at least the following conditions:

The GROUP by condition field must be the first consecutive position in the same index;
While using group BY, only the two aggregate functions, MAX and MIN, can be used.
If a reference is made to a field condition other than the GROUP by condition in the index, it must exist as a constant;

Why is the efficiency of a loose index scan high?

Because there is no WHERE clause, that is, a full index scan is required, the number of key values that a loose index scan needs to read is as many as the number of groups grouped, that is, much less than the number of key values that actually exist. When the WHERE clause contains a range-judged or equivalent expression, the loosely indexed scan finds the 1th keyword for each group that satisfies the scope criteria, and reads the minimum number of keywords again.

2. Use compact (tight) index scan to implement GROUP by

Compact index scanning the difference between a group by and a loose index scan is that he needs to read all the index keys that meet the criteria while scanning the index, and then complete the group by operation based on reading the bad data.

[Email protected]: example ,: -: ->EXPLAIN-SELECT Max (gmt_create)-From group_message--WHERE group_id =2-GROUP by User_id\g***************************1. Row ***************************ID:1Select_type:simpletable:group_messagetype:refpossible_keys:idx_group_message_gid_uid,idx_gid_uid_gckey:idx _gid_uid_gckey_len:4ref:constrows:4extra:using where; Using Index1RowinchSet (0.01Sec

There is no "Using index for Group-by" in the Extra information for the execution plan at this time, but it does not mean that the group by operation of MySQL is not done by index, except that it is necessary to access all the index key information that is qualified by the WHERE condition before the knot is reached. Fruit. This is achieved through a compact index scan to implement the GROUP by's execution plan output information.
The following picture shows the approximate entire execution process:

In MySQL, MySQL Query Optimizer first chooses to attempt a group by operation with a loose index scan, and then attempts to implement it through a compact index scan when it finds that some cases do not meet the requirements of the group by for a loose index scan.

When the group by condition field is not contiguous or is not part of the index prefix, MySQL Query Optimizer cannot use a loose index scan, and the set cannot complete the group by operation directly through the index because the missing index key information is not available. However, if there is a constant value in the Query statement that references the missing index key, the GROUP by operation can be accomplished using a compact index scan, because the constant fills the "gap" in the Search keyword to form a complete index prefix. These index prefixes can be used for index lookups. If you need to sort the group by result, and you can form the Search keyword for the index prefix, MySQL can also avoid additional sorting operations, because searching with the prefix of a sequential index retrieves all the keywords in order.

3. Using temporary tables to implement GROUP by

When MySQL is doing a group by operation, the fields that must satisfy the group by must be stored in the same index at the same time, and the index is an ordered index (such as a Hash index that does not meet the requirements). And, not only that, the ability to use an index to implement GROUP by is also related to the aggregate function used.

The first two group by implementations are used when there are available indexes, and when MySQL Query Optimizer cannot find a suitable index to use, it has to read the required data and then complete the GROUP by operation with the temporary table.

[Email protected]: example the: Geneva: +>EXPLAIN-SELECT Max (gmt_create)-From group_messageWHERE group_id >1< group_idTen-GROUP by User_id\g***************************1. Row ***************************ID:1Select_type:simpletable:group_messagetype:rangepossible_keys:idx_group_message_gid_uid,idx_gid_uid_gckey: Idx_gid_uid_gckey_len:4ref:nullrows: +extra:using where; Using index; Using temporary; Using filesort 

The execution plan is very obvious to us. MySQL finds the data we need by indexing, then creates a temporary table and sorts it to get the GROUP by result we need. the entire execution process is probably as shown:

When MySQL Query Optimizer found that only the index scan was not able to directly get the result of group by, he had to choose to implement group by using temporary tables and then sorting.

In this example, this is the case. GROUP_ID is not a constant condition, but a range, and the GROUP by field is user_id. So MySQL cannot help with the implementation of the Group by in the order of the indexes, it can only scan the required data by the index range, then save the data to the staging table, then sort and group the operations to complete group by.

SQL GROUP BY underlying principle--essentially sort, can be sequenced in advance with the index

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.