MySQL optimizes group by (loose index scan and compact index scan) and mysqlgroup

Source: Internet
Author: User

MySQL optimizes group by (loose index scan and compact index scan) and mysqlgroup

The most common method to satisfy the group by clause is to scan the entire table and create a new temporary table. All rows in each GROUP in the table should be continuous, then use the temporary table to locate the group and apply the accumulate function (if any ). In some cases, MySQL can do better, that is, access through indexes without creating temporary tables.
The most important prerequisite for using an index for group by is that all group by columns reference the same index attribute and the index stores its keywords in order. Whether to use index access instead of temporary tables depends on which part of the index is used in the query, the conditions specified for this part, and the selected cumulative function.
Because group by actually performs sorting operations, and compared with order by, group by mainly only performs grouping operations after sorting. Of course, if other Aggregate functions are used during grouping, Some Aggregate functions are required for calculation. Therefore, in the implementation of group by, indexes can also be used like order. In MySQL, the implementation of group by also has multiple (three) methods, two of which use the existing index information to complete group, another scenario is that indexes cannot be used completely. The following is an analysis of the three implementation methods.

1. Use Loose index scan to implement GROUP

I have read a lot about the definition of "loose index scanning" and I don't quite understand it. The following columns are listed here:
Definition 1: loose index scanning, in fact, is that when MySQL fully uses index scanning to implement group by, it does not need to scan all the index keys that meet the conditions to complete the operation to obtain the results.
Definition 2: the most effective way to optimize Group By is to directly use indexes to completely obtain the fields that require group. When this access method is used, MySQL uses the index type (such as the BTREE index) that sorts the keywords ). This makes it unnecessary for the group field in the index to completely cover the key corresponding to the index in the WHERE condition. Because it only contains part of the keywords in the index, it is called a loose index scan.
It indicates the group field used in the index. It does not need to contain all fields of multiple column indexes. For example, if there is an index idx (c1, c2, c3), then group by c1, group by c1, c2, c1, c1, and c2 are only part of the index idx. Note that the fields used for group in the index must comply with the "leftmost prefix" principle of the index. Group by c1, c3 won't use loose index Scanning
For example:
Explain
SELECT group_id, gmt_create
FROM group_message
WHERE user_id> 1
Group by group_id, gmt_create;
I understand the example of Definition 2.
There is an index idx (c1, c2, c3)
SELECT c1, c2 FROM t1 WHERE c1 <const group by c1, c2;
The fields used for group in the index are c1 and c2.
You do not need to completely cover the key corresponding to the index in the WHERE condition (the index in the where condition is c1; the key corresponding to c1 is idx)
The field (c1, c2) used for group in the index only contains part of the keywords (c1, c2, c3) in the index. Therefore, it is called loose index scanning.
To use a loose index scan to implement group by, you must meet at least the following conditions:
◆ Query a single table
◆ The group by condition field must be in the first consecutive position in the same index;
Group by includes 1st consecutive parts of the index. (If a group by clause is queried, all attributes of DISTINCT point to the beginning of the index ).
◆ When using group by, if there is an aggregate function, you can only use the MAX and MIN Aggregate functions, and they all point to the same column.
◆ If the referenced field condition (in the where condition) exists outside the group by condition in the index, it must exist as a constant, but MIN () or MAX () function parameter exceptions;
Or any other part of the index (except those from group by referenced in the query) must be a constant (that is, they must be referenced BY the number of constants), but MIN () or the parameter exception of the MAX () function.
Supplement: If the SQL statement contains a where statement and the select statement references a field condition other than the GROUP BY condition in the index, the fields in the where statement must exist as constants.
◆ If the query contains a where condition, the condition must be an index and cannot contain non-indexed fields.

Loose index Scanning
Explain
SELECT group_id, user_id
FROM group_message
WHERE group_id between 1 and 4
Group by group_id, user_id;
Loose index Scanning
Explain
SELECT group_id, user_id
FROM group_message
WHERE user_id> 1 and group_id = 1
Group by group_id, user_id;
Non-loose index Scanning
Explain
SELECT group_id, user_id
FROM group_message
WHERE abc = 1
Group by group_id, user_id;
Non-loose index Scanning
Explain
SELECT group_id, user_id
FROM group_message
WHERE user_id> 1 and abc = 1
Group by group_id, user_id;
Loose index scan. The EXPLAIN output of this type of query shows the Using index for group-by of the Extra column.

The following query provides several examples of this class. Assume that the table t1 (c1, c2, c3, c4) has an index idx (c1, c2, c3 ):

SELECT c1, c2 FROM t1 group by c1, c2;
Select distinct c1, c2 FROM t1;
SELECT c1, MIN (c2) FROM t1 group by c1;
SELECT c1, c2 FROM t1 WHERE c1 <const group by c1, c2;
Select max (c3), MIN (c3), c1, c2 FROM t1 WHERE c2> const group by c1, c2;
SELECT c2 FROM t1 WHERE c1 <const group by c1, c2;
SELECT c1, c2 FROM t1 WHERE c3 = const group by c1, c2;

For the above reason, you cannot use this quick selection method to execute the following query:

1. In addition to MIN () or MAX (), there are other cumulative functions, such:
SELECT c1, SUM (c2) FROM t1 group by c1;
2. The fields in the group by clause do not reference the beginning of the index, as shown below:
SELECT c1, c2 FROM t1 group by c2, c3;
3. the query references a part of the keyword after the group by part, and there is no equality equal to the constant, for example:
SELECT c1, c3 FROM t1 group by c1, c2;
In this example, c3 is referenced (c3 must be one of the composite indexes), because group by does not contain c3. And there is no equality equal to a constant. Therefore, you cannot use loose index scanning.
You can change it like this: SELECT c1, c3 FROM t1 where c3 = 'A' group by c1, c2
The following example does not support loose index scanning.
SELECT c1, c3 FROM t1 where c3 = 'A' group by c1, c2
Why is loose index scanning very efficient?
A: When there is no WHERE clause, that is, the number of keys to be read by the loose index scan must be the same as the number of groups in the group, that is to say, it is much less than the actual number of key values. When the WHERE clause contains a range limit or an equivalent expression, the loose index scans 1st keywords in each group that meet the range conditions and reads as few keywords as possible again.

2. Use Tight index scan to implement GROUP

The difference between group by and loose index scanning for compact index scanning is as follows:
The compact index scan requires that you read all the index keys that meet the conditions when scanning the index, and then complete the group by operation based on the data retrieved.
At this time, the Extra information of the execution plan does not contain "Using index for group-by", but it does not mean that the MySQL group by operation is not completed through the index, the result is obtained only after you access all the index key information specified by the WHERE condition. This is the execution plan output information of group by through compact index scanning.
In MySQL, MySQL Query Optimizer first tries to implement the group by operation through loose index scanning. When it is found that some situations cannot meet the requirements of the loose index scan for group, will try to achieve through compact index scanning.
When the group by condition fields are not consecutive or are not part of the index prefix, MySQL Query Optimizer cannot use loose index scanning.
In this case, check whether the condition field in where has the prefix of the index. If the prefix is available and this part is a constant, it is combined with the field after group by to form a continuous index. Scan by compact index.

SELECT max (gmt_create)
FROM group_message
WHERE group_id = 2
Group by user_id

You need to read all the data whose group_id is 2, and then complete the group by operation on the data to get the result. (Here, the group by field is not a continuous index. In the where clause, group_id exactly makes up for the missing index key and is a constant. Therefore, compact index scanning is used)
This index can be used in the order group_id user_id. If the connection sequence does not conform to the index's "leftmost prefix" principle, compact index scanning is not used.

The following example uses compact index Scanning

There is a gap in group by, but it is already overwritten BY the condition user_id = 1.
Explain
SELECT group_id, gmt_create
FROM group_message
WHERE user_id = 1 group by group_id, gmt_create

Group by does not start with the 1st elements of a keyword, but has a condition that provides constants for this element.
Explain
SELECT group_id, gmt_create
FROM group_message
WHERE group_id = 1 group by user_id, gmt_create

The following examples do not use compact index scanning.
The connection between user_id and gmt_create does not conform to the "leftmost prefix" principle of the index.
Explain
SELECT group_id, gmt_create
FROM group_message
WHERE user_id = 1 group by gmt_create
Group_id and gmt_create do not conform to the "leftmost prefix" principle of the index.
Explain
SELECT gmt_create
FROM group_message
WHERE group_id = 1 group by gmt_create;

3. Use a temporary table to implement GROUP

MySQL Query Optimizer finds that the result of group by cannot be directly obtained only through index scanning, and then he has to use a temporary table and then sort it to implement group. This is the case in this example. Group_id is not a constant condition, but a range, and the group by field is user_id. Therefore, MySQL cannot help the implementation of group by according to the index sequence. It can only scan the index range to obtain the required data, and then store the data into a temporary table, then, sort and GROUP the group by statement.
Explain
SELECT group_id
FROM group_message
WHERE group_id between 1 and 4
Group by user_id;
Sample Database File

-- ---------------------------------------------------------- Host:             127.0.0.1-- Server version:        5.1.57-community - MySQL Community Server (GPL)-- Server OS:          Win32-- HeidiSQL version:       7.0.0.4156-- Date/time:          2012-08-20 16:52:10-- --------------------------------------------------------/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;/*!40101 SET NAMES utf8 */;/*!40014 SET FOREIGN_KEY_CHECKS=0 */;-- Dumping structure for table test.group_messageDROP TABLE IF EXISTS `group_message`;CREATE TABLE IF NOT EXISTS `group_message` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `group_id` int(10) unsigned DEFAULT NULL, `user_id` int(10) unsigned DEFAULT NULL, `gmt_create` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `abc` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`id`), KEY `group_id_user_id_gmt_create` (`group_id`,`user_id`,`gmt_create`)) ENGINE=MyISAM AUTO_INCREMENT=27 DEFAULT CHARSET=utf8;-- Dumping data for table test.group_message: 0 rowsDELETE FROM `group_message`;/*!40000 ALTER TABLE `group_message` DISABLE KEYS */;INSERT INTO `group_message` (`id`, `group_id`, `user_id`, `gmt_create`, `abc`) VALUES(1, 1, 1, '2012-08-20 09:25:35', 1),(2, 2, 1, '2012-08-20 09:25:39', 1),(3, 2, 2, '2012-08-20 09:25:47', 1),(4, 3, 1, '2012-08-20 09:25:50', 2),(5, 3, 2, '2012-08-20 09:25:52', 2),(6, 3, 3, '2012-08-20 09:25:54', 0),(7, 4, 1, '2012-08-20 09:25:57', 0),(8, 4, 2, '2012-08-20 09:26:00', 0),(9, 4, 3, '2012-08-20 09:26:02', 0),(10, 4, 4, '2012-08-20 09:26:06', 0),(11, 5, 1, '2012-08-20 09:26:09', 0),(12, 5, 2, '2012-08-20 09:26:12', 0),(13, 5, 3, '2012-08-20 09:26:13', 0),(14, 5, 4, '2012-08-20 09:26:15', 0),(15, 5, 5, '2012-08-20 09:26:17', 0),(16, 6, 1, '2012-08-20 09:26:20', 0),(17, 7, 1, '2012-08-20 09:26:23', 0),(18, 7, 2, '2012-08-20 09:26:28', 0),(19, 8, 1, '2012-08-20 09:26:32', 0),(20, 8, 2, '2012-08-20 09:26:35', 0),(21, 9, 1, '2012-08-20 09:26:37', 0),(22, 9, 2, '2012-08-20 09:26:40', 0),(23, 10, 1, '2012-08-20 09:26:42', 0),(24, 10, 2, '2012-08-20 09:26:44', 0),(25, 10, 3, '2012-08-20 09:26:51', 0),(26, 11, 1, '2012-08-20 09:26:54', 0);/*!40000 ALTER TABLE `group_message` ENABLE KEYS */;/*!40014 SET FOREIGN_KEY_CHECKS=1 */;/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;

References
1. How does MySQL optimize group?
2. Explain the implementation principle of MySQL Group query Group
3. Loose index scan)
4. MySQL Study Notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.