Mysql database GROUP

Source: Internet
Author: User

Mysql database groups are divided into three methods. Temporary tables are used for group by, compact (Tight) index scanning, and group by and Loose index scanning, but I often use

1. Use Loose index scanning to implement GROUP

What is loose index scan to implement group? In fact, when MySQL uses index scan to implement group by, it does not need to scan all the index keys that meet the conditions to complete the operation.

In the following example, we use a loose index scan to implement group by. Before this example, we need to adjust the index of the group_message table and add the gmt_create field to the index of the group_id and user_id fields:

The Code is as follows: Copy code
1 sky @ localhost: example 08:49:45> create index idx_gid_uid_gc
2
3-> on group_message (group_id, user_id, gmt_create );
4
5 Query OK, rows affected (0.03 sec)
6
7 Records: 96 Duplicates: 0 Warnings: 0
8
9 sky @ localhost: example 09:07:30> drop index idx_group_message_gid_uid
10
11-> on group_message;
12
13 Query OK, 96 rows affected (0.02 sec)
14
15 Records: 96 Duplicates: 0 Warnings: 0

Then let's look at the execution plan of the following Query:

The Code is as follows: Copy code

1 sky @ localhost: example 09:26:15> EXPLAIN
2
3-> SELECT user_id, max (gmt_create)
4
5-> FROM group_message
6
7-> WHERE group_id <10
8
9-> group by group_id, user_idG
10
11 **************************** 1. row ***************************
12
13 id: 1
14
15 select_type: SIMPLE
16
17 table: group_message
18
19 type: range
20
21 possible_keys: idx_gid_uid_gc
22
23 key: idx_gid_uid_gc
24
25 key_len: 8
26
27 ref: NULL
28
29 rows: 4
30
31 Extra: Using where; Using index for group-
32
33 1 row in set (0.00 sec)

We can see that "Using index for group-by" is displayed in the Extra information of the Execution Plan. In fact, this is to tell us, mySQL Query Optimizer uses loose index scanning to implement the group by operation we need.

The image below depicts the approximate Implementation of the scanning process:

To use a loose index scan to implement group by, you must meet at least the following conditions:

◆ The group by condition field must be in the first consecutive position in the same index;

◆ When using group by, only the MAX and MIN Aggregate functions can be used;

◆ If a field condition other than the group by condition in the index is referenced, it must exist as a constant;

Why is loose index scanning very efficient?

Because there is no WHERE clause, that is, when the full index scan is required, the number of key values to be read by the loose index scan is as large as the number of groups in the group, that is to say, it is much less than the actual number of key values. When the WHERE clause contains a range limit or an equivalent expression, the loose index scans 1st keywords in each group that meet the range conditions and reads as few keywords as possible again.

2. Use a compact (Tight) index scan to implement GROUP

The difference between a compact index scan and a loose index scan is that it needs to read all the matching index keys when scanning the index, then, the group by operation is completed based on the read of the evil data to obtain the corresponding result.

The Code is as follows: Copy code

1 sky @ localhost: example 08:55:14> EXPLAIN
2
3-> SELECT max (gmt_create)
4
5-> FROM group_message
6
7-> WHERE group_id = 2
8
9-> group by user_idG
10
11 **************************** 1. row ***************************
12
13 id: 1
14
15 select_type: SIMPLE
16
17 table: group_message
18
19 type: ref
20
21 possible_keys: idx_group_message_gid_uid, idx_gid_uid_gc
22
23 key: idx_gid_uid_gc
24
25 key_len: 4
26
27 ref: const
28
29 rows: 4
30
31 Extra: Using where; Using index
32
33 1 row in set (0.01 sec)


At this time, the Extra information of the execution plan does not contain "Using index for group-by", but it does not mean that the MySQL group by operation is not completed through the index, the result is obtained only after you access all the index key information specified by the WHERE condition. This is the execution plan output information of group by through compact index scanning.

In MySQL, MySQL Query Optimizer first tries to implement the group by operation through loose index scanning. When it is found that some situations cannot meet the requirements of the loose index scan for group, will try to achieve through compact index scanning.

When the group by condition field is not consecutive or is not part of the index prefix, MySQL Query Optimizer cannot use loose index scanning, and the setting cannot directly complete the group by operation through the index, the missing index key information cannot be obtained. However, if the Query statement contains a constant value to reference the missing index key, you can use compact index scan to complete the group by operation, because the constant fills the "gap" in the search keyword ", you can create a complete index prefix. These index prefixes can be used for index search. MySQL can also avoid additional sorting operations if you want to sort group by results and search keywords with index prefixes, because all the keywords have been searched in order by using the prefix of the ordered index.

3. Use a temporary table to implement GROUP

MySQL must store all the group by fields in the same index to make full use of the group by operation, the index is an ordered index (for example, the Hash index cannot meet the requirements ). In addition, it is not just that whether the index can be used to implement group by is also related to the aggregate function used.

The preceding two group by implementation methods are used when available indexes are available. When MySQL Query Optimizer cannot find an appropriate index that can be used, you have to read the required data first, and then use a temporary table to complete the group by operation.

The Code is as follows: Copy code

1 sky @ localhost: example 09:02:40> EXPLAIN
2
3-> SELECT max (gmt_create)
4
5-> FROM group_message
6
7-> WHERE group_id> 1 and group_id <10
8
9-> group by user_idG
10
11 **************************** 1. row ***************************
12
13 id: 1
14
15 select_type: SIMPLE
16
17 table: group_message
18
19 type: range
20
21 possible_keys: idx_group_message_gid_uid, idx_gid_uid_gc
22
23 key: idx_gid_uid_gc
24
25 key_len: 4
26
27 ref: NULL
28
29 rows: 32
30
31 Extra: Using where; Using index; Using temporary; Using filesort

This execution plan clearly tells us that MySQL finds the data we need through the index, creates a temporary table, and sorts the table to obtain the group by result we need. The entire execution process is shown in the following figure:

After MySQL Query Optimizer finds that the result of group by cannot be directly obtained only through index scanning, it has to use a temporary table and then sort it to implement group.

This is the case in this example. Group_id is not a constant condition, but a range, and the group by field is user_id. Therefore, MySQL cannot help the implementation of group by according to the index sequence. It can only scan the index range to obtain the required data, and then store the data into a temporary table, then, sort and GROUP the group by statement.

After talking about this, the simplest thing is

(Query the topic title table of dedecms (zhimeng) program and group it by topic id)

The Code is as follows: Copy code

SELECT *
FROM 'dede _ archives'
Group by 'typeid'
LIMIT 0, 30

That's all.

Some related group by instances

The Code is as follows: Copy code


-- Group data by a certain field to get the data of the row with the maximum (small) Value

/*
The data is as follows:
Name val memo
A 2 a2 (the second value of)
A 1 a1 -- the first value of
A 3 a3: The third value of
B 1 b1 -- the first value of B
B 3 b3: The third value of B
B 2 b2b2b2b2
B 4 b4b4
B 5 b5b5b5b5b5
*/

-- Create a table and insert data:

Create table tb (name varchar (10), val int, memo varchar (20 ))
Insert into tb values ('A', 2, 'a2 (second value of )')
Insert into tb values ('A', 1, 'a1 -- the first value of ')
Insert into tb values ('A', 3, 'a3: The third value of ')
Insert into tb values ('B', 1, 'b1 -- the first value of B ')
Insert into tb values ('B', 3, 'b3: The third value of B ')
Insert into tb values ('B', 2, 'b2b2b2b2 ')
Insert into tb values ('B', 4, 'b4b4 ')
Insert into tb values ('B', 5, 'b5b5b5b5 ')
Go

-- 1. Group data by name to get the data of the row with the largest val value.

-- Method 1: select a. * from tb a where val = (select max (val) from tb where name = a. name) order by a. name
-- Method 2:
Select a. * from tb a where not exists (select 1 from tb where name = a. name and val> a. val)
-- Method 3:
Select. * from tb a, (select name, max (val) val from tb group by name) B where. name = B. name and. val = B. val order by. name
-- Method 4:
Select. * from tb a inner join (select name, max (val) val from tb group by name) B on. name = B. name and. val = B. val order by. name
-- Method 5
Select a. * from tb a where 1> (select count (*) from tb where name = a. name and val> a. val) order by a. name
/*
Name val memo
-----------------------------------------
A 3 a3: The third value of
B 5 b5b5b5b5b5

*/

I recommend using 1, 3, 4. The results show that 1, 3, 4 are the same, and 2, 5 are less efficient. However, if I have the same 3, 4 efficiency, there is no doubt that 1 is different.

-- 2. group the data of the row with the smallest val value by name.

-- Method 1: select a. * from tb a where val = (select min (val) from tb where name = a. name) order by a. name
-- Method 2:
Select a. * from tb a where not exists (select 1 from tb where name = a. name and val <a. val)
-- Method 3:
Select. * from tb a, (select name, min (val) val from tb group by name) B where. name = B. name and. val = B. val order by. name
-- Method 4:
Select. * from tb a inner join (select name, min (val) val from tb group by name) B on. name = B. name and. val = B. val order by. name
-- Method 5
Select a. * from tb a where 1> (select count (*) from tb where name = a. name and val <a. val) order by a. name
/*
Name val memo
-----------------------------------------
A 1 a1 -- the first value of
B 1 b1 -- the first value of B

*/

-- 3. Group by name to obtain the data of the row that appears for the first time.

Select a. * from tb a where val = (select top 1 val from tb where name = a. name) order by a. name
/*
Name val memo
-----------------------------------------
A 2 a2 (the second value of)
B 1 b1 -- the first value of B
*/

-- 4. A random data entry is grouped by name.

Select a. * from tb a where val = (select top 1 val from tb where name = a. name order by newid () order by a. name /*
Name val memo
-----------------------------------------
A 1 a1 -- the first value of
B 5 b5b5b5b5b5

*/

-- 5. Group by name to get the smallest two (N) val values

Select. * from tb a where 2> (select count (*) from tb where name =. name and val <. val) order by. name,. valselect. * from tb a where val in (select top 2 val from tb where name =. name order by val) order by. name,. val
Select. * from tb a where exists (select count (*) from tb where name =. name and val <. val having Count (*) <2) order by. name
/*
Name val memo
-----------------------------------------
A 1 a1 -- the first value of
A 2 a2 (the second value of)
B 1 b1 -- the first value of B
B 2 b2b2b2b2

*/

-- 6. Group by name to get the maximum two (N) val values

Select a. * from tb a where 2> (select count (*) from tb where name = a. name and val> a. val) order by a. name, a. val
Select a. * from tb a where val in (select top 2 val from tb where name = a. name order by val desc) order by a. name, a. val
Select. * from tb a where exists (select count (*) from tb where name =. name and val>. val having Count (*) <2) order by. name
/*
Name val memo
-----------------------------------------
A 2 a2 (the second value of)
A 3 a3: The third value of
B 4 b4b4
B 5 b5b5b5b5b5
*/


-- 7. If the data in the entire row is repeated, all columns are the same (for example, the data in rows 5 and 6 in the following table is identical ).
The maximum two (N) val values are grouped by name.

/*
The data is as follows:
Name val memo
A 2 a2 (the second value of)
A 1 a1 -- the first value of
A 1 a1 -- the first value of
A 3 a3: The third value of
A 3 a3: The third value of
B 1 b1 -- the first value of B
B 3 b3: The third value of B
B 2 b2b2b2b2
B 4 b4b4
B 5 b5b5b5b5b5

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.