MySQL optimized group by-Loose index scan with compact index scan

Source: Internet
Author: User
Tags compact sorts

The most general way to satisfy the GROUP BY clause is to scan the entire table and create a new temporary table, all rows of each group in the table should be contiguous, and then use the temporary table to locate the group and apply the cumulative function, if any. In some cases, MySQL can do a better job of accessing the index without creating a temporary table.

The most important precondition for using indexes for group by is that all group by columns reference properties of the same index, and the index saves its keywords sequentially. The use of index access instead of temporary tables also depends on which part of the index is used in the query, the conditions specified for that part, and the cumulative function that is selected.

Because group by actually sorts, and the group by is mostly just a sort of grouping operation, compared to order by. Of course, if you use some other aggregate functions when grouping, you also need to calculate some aggregate functions. Therefore, in the implementation of group by, the index can be used as well as the ORDER by. In MySQL, the implementation of group by also has several (three) ways, two of which use existing index information to complete group by, and another for scenarios where the index is completely unusable. Let's do an analysis of these three implementations separately.

1. Use a loose index scan (Loose Index Scan) to implement GROUP by

The definition of "loose index Scan", I read a lot of online introduction, are not very clear. The following are the logical columns:

Definition 1: Loose index scanning, in fact, when MySQL is fully using the index scan to implement GROUP by, do not need to scan all the conditions of the index key to complete the operation results.

Definition 2: The most effective way to optimize group by IS when you can use the index directly to get the fields that require the group completely. With this access method, MySQL uses the type of index that the keyword sorts (such as the Btree Index). This makes the fields used for group in the index do not have to completely cover the key corresponding to the index in the Where condition. This is called a loose index scan because it contains only a subset of the keywords in the index.
This means that the fields for group in the index do not need to contain all the fields of the multi-column index. For example: There is an index idx (C1,C2,C3), then group by C1, group by C1,C2 such C1 or C1, C2 are only part of the index IDX. Note that the field used for group in the index must conform to the index's "leftmost prefix" principle. Group BY C1,C3 does not use a loose index scan

For example:

Explain
SELECT group_id,gmt_create
From Group_message
WHERE user_id>1
GROUP by Group_id,gmt_create;

I understand that the "definition 2" example illustrates

There is an index of IDX (C1,C2,C3)

SELECT C1, c2 from T1 WHERE C1 < const GROUP by C1, C2;

The field used in the index for group C1,C2
You do not have to completely cover the key in the Where condition (the index in the where condition, that is, the C1;C1 corresponding key, which is the IDX)
The field used for group in the Index (C1,C2) contains only part of the keyword (C1,C2,C3) in the index, so it is called a loose index scan.

To implement group by with a loose index scan, you need to meet at least the following conditions:

Query for a single table

The GROUP by condition field must be the first consecutive position in the same index;
The group by includes a 1th contiguous portion of the index (if the query has a distinct clause for GROUP BY, all distinct properties point to the beginning of the index).

With GROUP BY, if you have an aggregate function, you can use only the two aggregate functions, MAX and MIN, and they all point to the same column.

If a reference (in a Where condition) is to a field condition other than the group by condition in the index, it must exist as a constant, except for the parameters of the min () or Max () function;
Alternatively, any other part of the index (except those from the group by referenced in the query) must be constant (that is, they must be referenced by the number of constants), except for the parameters of the min () or Max () function.

Add: If there is a where statement in SQL, and select references a field condition other than the group by condition in the index, the fields in the Where are in constant form.

If there is a where condition in the query, the condition must be an index and cannot contain non-indexed fields

Loose index Scanning
Explain
SELECT group_id,user_id
From Group_message
WHERE group_id between 1 and 4
GROUP by group_id,user_id;

Loose index Scanning
Explain
SELECT group_id,user_id
From Group_message
WHERE User_id>1 and Group_id=1
GROUP by group_id,user_id;

Non-loose index scan
Explain
SELECT group_id,user_id
From Group_message
WHERE abc=1
GROUP by group_id,user_id;

Non-loose index scan
Explain
SELECT group_id,user_id
From Group_message
WHERE User_id>1 and Abc=1
GROUP by group_id,user_id;

A loosely indexed scan, the explain output of such a query shows the extra column using the index for group-by

The following query provides several examples of this class, assuming that table T1 (C1,C2,C3,C4) has an index of IDX (C1,C2,C3):

SELECT C1, c2 from T1 GROUP by C1, C2;

SELECT DISTINCT C1, c2 from T1;

SELECT C1, MIN (C2) from T1 GROUP by C1;

SELECT C1, c2 from T1 WHERE C1 < const GROUP by C1, C2;

SELECT MAX (C3), MIN (C3), C1, c2 from t1 WHERE c2 > Const GROUP by c1, C2;

SELECT C2 from T1 WHERE C1 < const GROUP by C1, C2;

SELECT C1, c2 from T1 WHERE c3 = Const GROUP by c1, C2;

For these reasons, you cannot use this quick selection method to execute the following query:

1, in addition to Min () or Max (), there are other cumulative functions, such as:

SELECT C1, SUM (C2) from T1 GROUP by C1;

2. The domain in the GROUP by clause does not reference the beginning of the index, as follows:

SELECT c1,c2 from T1 GROUP by C2, C3;

3. The query references a part of the keyword following the group by section, and there is no equality equal to the constant, for example:

SELECT c1,c3 from T1 GROUP by C1, C2;

In this example, the reference to C3 (C3 must be one of the combined indexes) because there is no C3 in group by. And there is no equality equal to the constant. So you can't use a loose index scan

This can be changed: SELECT c1,c3 from T1 where c3= ' a ' GROUP by C1, C2

The following example cannot be scanned with a loose index

SELECT c1,c3 from T1 where c3= ' a ' GROUP by C1, C2

Why is the efficiency of a loose index scan high?

A: Because there is no WHERE clause, that is, a full index scan is required, the number of key values that a loose index scan needs to read is as many as the number of groups grouped, that is, much less than the number of key values that actually exist. When the WHERE clause contains a range-judged or equivalent expression, the loosely indexed scan finds the 1th keyword for each group that satisfies the scope criteria, and reads the minimum number of keywords again.

2. Use compact index Scan (tight index scan) to implement GROUP by

The difference between a compact index scan and a GROUP by and a loose index scan is mainly:

A compact index scan needs to read all the index keys that meet the criteria when scanning the index, and then complete the GROUP by operation based on the data read.

There is no "Using index for Group-by" in the Extra information for the execution plan at this time, but it does not mean that the group by operation of MySQL is not done by index, except that it is necessary to access all the index key information that is qualified by the WHERE condition before the knot is reached. Fruit. This is achieved through a compact index scan to implement the GROUP by's execution plan output information.

In MySQL, MySQL Query Optimizer first chooses to attempt a group by operation with a loose index scan, and then attempts to implement it through a compact index scan when it finds that some cases do not meet the requirements of the group by for a loose index scan.

MySQL Query Optimizer cannot use a loose index scan when the GROUP by condition field is not contiguous or is not part of the index prefix.

At this point, the conditional field in where is checked for the prefix part of the index, if there is an earlier part, and the part is a constant, and the field after the group by IS combined into a contiguous index. A compact index scan is now in order.

SELECT Max (gmt_create)
From Group_message
WHERE group_id = 2
GROUP by user_id

You need to read all the data from the group_id=2 and then complete the group by operation in the read data to get the result. (The Group By field is not a contiguous index, just where group_id is exactly the missing index key and exactly a constant, so use a compact index scan)

group_id user_id This is the order in which the index can be used. If the order of the connections does not conform to the "leftmost prefix" principle of the index, compact index scanning is not used.

The following example uses a compact index scan

There is a gap in GROUP by, but it has been covered by the conditional user_id = 1.

Explain
SELECT group_id,gmt_create
From Group_message
WHERE user_id = 1 GROUP by group_id,gmt_create


GROUP by does not start with the 1th element of the keyword, but there is a condition that provides a constant for the element
Explain
SELECT group_id,gmt_create
From Group_message
WHERE group_id = 1 GROUP BY user_id,gmt_create


The following examples do not use compact index scans

User_id,gmt_create connection does not conform to the index "leftmost prefix" principle
Explain
SELECT group_id,gmt_create
From Group_message
WHERE user_id = 1 GROUP by gmt_create

Group_id,gmt_create connection does not conform to the index "leftmost prefix" principle

Explain
SELECT gmt_create
From Group_message
WHERE group_id=1 GROUP by Gmt_create;

3. Using temporary tables to implement GROUP by

After MySQL Query Optimizer found that only the index scan was not able to directly get the result of group by, he had to choose to implement group by using temporary tables and then sorting. In this example, this is the case. GROUP_ID is not a constant condition, but a range, and the GROUP by field is user_id. So MySQL cannot help with the implementation of the Group by in the order of the indexes, it can only scan the required data by the index range, then save the data to the staging table, then sort and group the operations to complete group by.

Explain
SELECT group_id
From Group_message
WHERE group_id between 1 and 4
GROUP by user_id;

Sample database files [SQL]View Plaincopy
    1. -- --------------------------------------------------------
    2. --host:127.0.0.1
    3. --Server Version:5.1.57-community-mysql Community Server (GPL)
    4. --Server Os:win32
    5. --Heidisql version:7.0.0.4156
    6. --Date/time:2012-08-20 16:52:10
    7. -- --------------------------------------------------------
    8. /*!40101 SET @[email protected] @CHARACTER_SET_CLIENT */;
    9. /*!40101 SET NAMES UTF8 */;
    10. /*!40014 SET foreign_key_checks=0 */;
    11. --Dumping structure for table Test.group_message
    12. DROP TABLE IF EXISTS ' group_message ';
    13. CREATE TABLE IF not EXISTS ' group_message ' (
    14. ' ID ' int (ten) unsigned not NULL auto_increment,
    15. ' group_id ' int (ten) unsigned DEFAULT NULL,
    16. ' user_id ' int (ten) unsigned DEFAULT NULL,
    17. ' Gmt_create ' timestamp not NULL DEFAULT current_timestamp,
    18. ' ABC ' int (one) not NULL DEFAULT ' 0 ',
    19. PRIMARY KEY (' id '),
    20. KEY ' group_id_user_id_gmt_create ' (' group_id ', ' user_id ', ' gmt_create ')
    21. ) Engine=myisam auto_increment=27 DEFAULT Charset=utf8;
    22. --Dumping data for table test.group_message:0 rows
    23. DELETE from ' group_message ';
    24. /*!40000 ALTER TABLE ' group_message ' DISABLE KEYS * *;
    25. INSERT into ' group_message ' (' id ', ' group_id ', ' user_id ', ' gmt_create ', ' abc ') VALUES
    26. (1, 1, 1, ' 2012-08-20 09:25:35 ', 1),
    27. (2, 2, 1, ' 2012-08-20 09:25:39 ', 1),
    28. (3, 2, 2, ' 2012-08-20 09:25:47 ', 1),
    29. (4, 3, 1, ' 2012-08-20 09:25:50 ', 2),
    30. (5, 3, 2, ' 2012-08-20 09:25:52 ', 2),
    31. (6, 3, 3, ' 2012-08-20 09:25:54 ', 0),
    32. (7, 4, 1, ' 2012-08-20 09:25:57 ', 0),
    33. (8, 4, 2, ' 2012-08-20 09:26:00 ', 0),
    34. (9, 4, 3, ' 2012-08-20 09:26:02 ', 0),
    35. (Ten, 4, 4, ' 2012-08-20 09:26:06 ', 0),
    36. (One, 5, 1, ' 2012-08-20 09:26:09 ', 0),
    37. (5, 2, ' 2012-08-20 09:26:12 ', 0),
    38. (5, 3, ' 2012-08-20 09:26:13 ', 0),
    39. (5, 4, ' 2012-08-20 09:26:15 ', 0),
    40. (5, 5, ' 2012-08-20 09:26:17 ', 0),
    41. (6, 1, ' 2012-08-20 09:26:20 ', 0),
    42. (7, 1, ' 2012-08-20 09:26:23 ', 0),
    43. (7, 2, ' 2012-08-20 09:26:28 ', 0),
    44. (8, 1, ' 2012-08-20 09:26:32 ', 0),
    45. (8, 2, ' 2012-08-20 09:26:35 ', 0),
    46. (9, 1, ' 2012-08-20 09:26:37 ', 0),
    47. (9, 2, ' 2012-08-20 09:26:40 ', 0),
    48. (max, 1, ' 2012-08-20 09:26:42 ', 0),
    49. (2, ' 2012-08-20 09:26:44 ', 0),
    50. (+, 3, ' 2012-08-20 09:26:51 ', 0),
    51. (0, 1, ' 2012-08-20 09:26:54 ');
    52. /*!40000 ALTER TABLE ' group_message ' ENABLE KEYS */;
    53. /*!40014 SET Foreign_key_checks=1 */;
    54. /*!40101 SET [email protected]_character_set_c

MySQL optimized group by-Loose index scan with compact index scan

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.