MySQL database grouping (group by) query instance

Source: Internet
Author: User
Tags compact constant create index join mysql in mysql query mysql database

1. Using Loose (Loose) index scans to implement GROUP by

What is a loose index scan implementation Group by? In fact, when MySQL fully utilizes the index scan to implement group by, it does not need to scan all index keys that meet the criteria to complete the operation results.

Let's use an example to describe the loose index scan implementation GROUP by, and before the example we need to first adjust the index of the Group_message table and add the Gmt_create field to the index of the group_id and user_id fields:

The code is as follows Copy Code
1 sky@localhost:example 08:49:45> CREATE INDEX IDX_GID_UID_GC
2
3-> on Group_message (group_id,user_id,gmt_create);
4
5 Query OK, rows affected (0.03 sec)
6
7 records:96 duplicates:0 warnings:0
8
9 sky@localhost:example 09:07:30> DROP Index Idx_group_message_gid_uid
10
One-> on Group_message;
12
Query OK, rows affected (0.02 sec)
14
records:96 duplicates:0 warnings:0

Then look at the following Query's execution plan:

  code is as follows copy code

1 sky@localhost:example 09:26:15> EXPLAIN
2
3-> SELECT user_id,m Ax (gmt_create)
4
5-> from Group_message
6
7-> WHERE group_id < ten
8
9-> GROUP BY GR OUP_ID,USER_IDG
Ten
*************************** 1 row ***************************

Id:1
14
Table:group_message
r>
Type:range

Possible_keys:idx_gi D_UID_GC

key:idx_gid_uid_gc

Key_len:8
[
ref:null

rows:4
30 extra:using where; Using index for group-by
$
1 row in Set (0.00 sec)

We see information in the Extra of the execution plan showing "Using index for Group-by", which in fact tells us that MySQL Query Optimizer the group by operation we need by using a loose index scan.

The following picture depicts the approximate implementation of the scanning process:

To use a loose index scan to implement GROUP by, you need to meet at least the following conditions:

The GROUP by condition field must be in a contiguous position at the front of the same index;

Only the MAX and MIN aggregate functions can be used at the same time that group by is used;

If a field condition other than the GROUP by condition is referenced in the index, it must exist as a constant;

Why is loose index scanning highly efficient?

Because the number of key values that a loose index scan needs to read is as large as the number of groups grouped, in the absence of a WHERE clause, which means that a full index scan is required, that is, much less than the number of key values that actually exist. When the WHERE clause contains a range or equivalent expression, a loose index scan looks for the 1th keyword for each group that satisfies the range criteria, and reads the minimum number of keywords again.

2. Use compact (tight) index scan to implement GROUP by

The difference between a compact index scan and a group by and a loose index scan is that he needs to read all the key keys that meet the criteria while scanning the index, and then complete the group by operation according to the data that reads the bad.

  code is as follows copy code

1 sky@localhost:example 08:55:14> EXPLAIN
2
3-> SELECT Max (gmt_ Create)
4
5-> from Group_message
6
7-> WHERE group_id = 2
8
9-> GROUP by USER_IDG

One *************************** 1. Row ***************************

id:1

Select_type:si Mple

Table:group_message

type:ref
/
Possible_keys:idx_group_message_gid_uid,idx _GID_UID_GC

key:idx_gid_uid_gc

Key_len:4
'
ref:const
'
Rows:4

Extra:using where; Using Index
$
1 row in Set (0.01 sec)


There is no "Using index for Group-by" in the Extra information of the execution plan at this time, but not that the MySQL group by operation is not done through the index, but only if you need access to all the key information that is qualified by the WHERE condition. Fruit. This is the implementation plan output information for GROUP by through a compact index scan.

In MySQL, MySQL Query Optimizer first chooses to attempt a group by operation through a loose index scan, and attempts to achieve it through a compact index scan when it is found that certain situations cannot satisfy the requirements of group by with loose index scans.

When the group by condition field is not contiguous or is not part of the index prefix, the MySQL Query Optimizer cannot use a loose index scan to set up a group by operation directly through the index, because the missing index key information is not available. However, if there is a constant value in the Query statement that references the missing index key, you can use a compact index scan to complete the GROUP by operation, because the constants populate the "gaps" in the Search keyword to form the full index prefix. These index prefixes can be used for index lookups. If you need to sort the group by results, and you can form the Search keyword for the index prefix, MySQL can also avoid extra sorting, because searching with the prefix of a sequential index retrieves all the keywords sequentially.

3. Use a temporary table to implement GROUP by

MySQL in the group by operation to take advantage of all, the field must meet GROUP by must be in the same index, and the index is an ordered index (such as Hash Index can not meet the requirements). Moreover, it is not just that the ability to use indexes to implement GROUP by also has to do with aggregate functions that are used.

The previous two group by implementations are used when there are available indexes, and when the MySQL Query Optimizer cannot find the right index to exploit, it has to read the required data first and then complete the GROUP by operation through a temporary table.

  code is as follows copy code

1 sky@localhost:example 09:02:40> EXPLAIN
2
3-> SELECT Max (gmt_c reate)
4
5-> from Group_message
6
7-> WHERE group_id > 1 and group_id < ten
8
9-> GROUP by USER_IDG
Ten
*************************** 1 row ***************************

Id:1
14
Select_type:simple

table:group_message
[
Type:range
/
Possible_keys:idx_ GROUP_MESSAGE_GID_UID,IDX_GID_UID_GC

key:idx_gid_uid_gc

Key_len:4
[
] Ref:null

rows:32

to extra:using where; Using index; Using temporary; Using filesort

This implementation plan is very clear to tell us that MySQL found the data we need through the index, then created a temporary table, and then sorted to get the GROUP by result we needed. The entire implementation process is probably shown in the following diagram:

When MySQL Query Optimizer found that only an index scan did not directly result in a group by, he had to choose to implement the group by using a temporary table and then sorting.

This is the case in such an example. GROUP_ID is not a constant condition, but a range, and the GROUP by field is user_id. As a result, MySQL cannot help the group by implementation based on the order of the indexes, only to get the required data through the index range, and then save the data in a temporary table before sorting and grouping operations to complete group by.

That's the easiest thing to say.

(Query Dedecms (DREAM) Program column title table, grouped by column ID)

The code is as follows Copy Code

SELECT *
From ' Dede_archives '
GROUP by ' typeid '
LIMIT 0, 30

That's it.

Some related group by instances

The code is as follows Copy Code


--Data for the row with the largest (small) value grouped by a field

/*
The data are as follows:
Name Val Memo
A 2 A2 (the second value of a)
The first value of a 1 a1--a
The third value of a 3 a3:a
The first value of B 1 b1--b
The third value of B 3 b3:b
B 2 b2b2b2b2
B 4 B4B4
B 5 B5b5b5b5b5
*/

--Create a table and insert data:

CREATE table TB (name varchar, val int,memo varchar (20))
INSERT into TB values (' A ', 2, ' A2 (second value of a) ')
INSERT into TB values (' A ', 1, ' a1--a first value ')
INSERT into TB values (' A ', 3, ' a3:a third value ')
INSERT into TB values (' B ', 1, ' b1--b first value ')
INSERT into TB values (' B ', 3, ' b3:b third value ')
INSERT into TB values (' B ', 2, ' b2b2b2b2 ')
INSERT into TB values (' B ', 4, ' b4b4 ')
INSERT into TB values (' B ', 5, ' b5b5b5b5b5 ')
Go

--First, the data in the row where Val's largest value is grouped by name.

--Method 1:select a.* from TB a WHERE val = (select Max (val) to TB where name = A.name) Order by A.name
--Method 2:
Select A.* from TB a where not exists (select 1 from TB where name = A.name and val > A.val)
--Method 3:
Select A.* from TB A, (select Name,max (Val) Val from TB Group by name) b where a.name = B.name and A.val = B.val ORDER by a . Name
--Method 4:
Select a.* from TB a INNER join (select name, Max (Val) Val to TB Group by name) b on a.name = b.name and A.val = B.val ORDER BY A.name
--Method 5
Select A.* from TB a where 1 > (select COUNT (*) to TB where name = A.name and val > A.val) Order by A.name
/*
Name Val Memo
---------- ----------- --------------------
The third value of a 3 a3:a
B 5 B5b5b5b5b5

*/

I recommend the use of 1,3,4, the results show 1,3,4 efficiency, 2,5 efficiency, but I 3,4 efficiency is the same no doubt, 1 is not the same, think about it.

-Two, grouped by name to fetch the data of the row with the lowest value of Val.

--Method 1:select a.* from TB a WHERE val = (select min (val) to TB where name = A.name) Order by A.name
--Method 2:
Select A.* from TB a where not exists (select 1 from TB where name = A.name and Val < a.val)
--Method 3:
Select A.* from TB A, (select Name,min (Val) Val from TB Group by name) b where a.name = B.name and A.val = B.val ORDER by a . Name
--Method 4:
Select a.* from TB a INNER join (select name, Min (Val) Val to TB Group by name) b on a.name = b.name and A.val = B.val ORDER BY A.name
--Method 5
Select A.* from TB a where 1 > (select COUNT (*) to TB where name = A.name and Val < a.val) Order by A.name
/*
Name Val Memo
---------- ----------- --------------------
The first value of a 1 a1--a
The first value of B 1 b1--b

*/

--three, grouped by name to take the first occurrence of the row of data.

Select A.* from TB a WHERE Val = (select top 1 val to TB where name = A.name) Order by A.name
/*
Name Val Memo
---------- ----------- --------------------
A 2 A2 (the second value of a)
The first value of B 1 b1--b
*/

--four, randomly take a piece of data by name Group.

Select A.* from TB a WHERE Val = (select top 1 val to TB where name = A.name order by NEWID ()) Order by a.name/*
Name Val Memo
---------- ----------- --------------------
The first value of a 1 a1--a
B 5 B5b5b5b5b5

*/

--Five, the smallest two (n) Val grouped by name

Select A.* from TB a where 2 > (select COUNT (*) to TB where name = A.name and Val < a.val) Order by a.name,a.vals Elect a.* from TB a where Val (select top 2 val to TB where Name=a.name order by Val) Order by A.name,a.val
Select A.* from TB a where exists (select COUNT (*) to TB where name = A.name and Val < a.val having Count (*) < 2) ORDER BY A.name
/*
Name Val Memo
---------- ----------- --------------------
The first value of a 1 a1--a
A 2 A2 (the second value of a)
The first value of B 1 b1--b
B 2 b2b2b2b2

*/

--six, the largest two (n) Val grouped by name

Select A.* from TB a where 2 > (select COUNT (*) to TB where name = A.name and val > A.val) Order by A.name,a.val
Select A.* from TB a where Val to (select top 2 val to TB where Name=a.name order by Val Desc) Order by A.name,a.val
Select A.* from TB a where exists (select COUNT (*) to TB where name = A.name and val > A.val having Count (*) < 2) ORDER BY A.name
/*
Name Val Memo
---------- ----------- --------------------
A 2 A2 (the second value of a)
The third value of a 3 a3:a
B 4 B4B4
B 5 B5b5b5b5b5
*/


--Seven, if the entire row of data is duplicated, all columns are the same (for example, the 5th and 62 rows in the following table are identical).
Maximum two (n) Val grouped by name

/*
The data are as follows:
Name Val Memo
A 2 A2 (the second value of a)
The first value of a 1 a1--a
The first value of a 1 a1--a
The third value of a 3 a3:a
The third value of a 3 a3:a
The first value of B 1 b1--b
The third value of B 3 b3:b
B 2 b2b2b2b2
B 4 B4B4
B 5 B5b5b5b5b5

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.