The basic principle of distinct statement in Mysql and its comparison with GROUP by

The basic principle of distinct statement in Mysql and its comparison with GROUP by _mysql

Last Update:2017-01-19 Source: Internet

Author: User

Tags compact create index mysql query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

DISTINCT is actually very similar to the implementation of a group by operation, except that only one record is taken out of each group by after the groups by. Therefore, the implementation of DISTINCT and GROUP by implementation is basically similar, not much difference. This can also be done through loose index scans or a compact index scan, which, of course, can only be accomplished by using a temporary table when it is not possible to use the index alone to complete the DISTINCT. However, one difference from GROUP by is that DISTINCT does not need to be sorted. That is, in Query that is just a DISTINCT operation, MySQL uses a temporary table to "cache" the data, but does not filesort the data in the temporary table if it cannot simply use the index to complete the operation. Of course, if we're using group by and grouping with DISTINCT, and using aggregate function operations like MAX, we can't avoid filesort.

Here's a few simple Query examples to show DISTINCT implementations.

1. First look at the DISTINCT through a loose index scan:

Sky@localhost:example 11:03:41> EXPLAIN SELECT DISTINCT group_id-> from 
  group_messageg

1. Row ***************************
      id:1
 select_type:simple
    table:group_message
     type:range
Possible_keys:null
     key:idx_gid_uid_gc
   key_len:4
     ref:null
     rows:10
    extra:using Index for Group-by
1 row in Set (0.00 sec)

We can see clearly that the Extra information in the execution plan is "Using index for group-by", what does that mean? Why do I not have a group by operation, and the execution plan tells me where group by is indexed? In fact, this is related to the implementation of DISTINCT, in the implementation of the DISTINCT process, it is also necessary to group, and then from each group of data to take out a return to the client. The Extra information here tells us that MySQL completes the operation with a loose index scan. Of course, if the MySQL Query Optimizer If you can do a bit more humane to change the information here to "Using index for distinct" that is better and easier to understand, hehe.

2. Let's take a look at an example of a compact index scan:

Sky@localhost:example 11:03:53> EXPLAIN SELECT DISTINCT user_id-> from 
  group_message->
  WHERE Group _id = 2G

1. Row ***************************
      id:1
 select_type:simple
    table:group_message
     type:ref
POSSIBLE_KEYS:IDX_GID_UID_GC
     key:idx_gid_uid_gc
   key_len:4
     ref:const
     rows:4
    extra:using WHERE; Using Index
1 row in Set (0.00 sec)

The display here and the implementation of GROUP by through a compact index scan are exactly the same. In fact, in this Query implementation, MySQL will let the storage engine scan group_id = 2 of all the index keys, draw all the user_id, and then take advantage of the indexed sorted characteristics, each replacement of a USER_ID index key value of the time to keep a message, you can scan all G ruop_id = 2 When the index key completes the entire DISTINCT operation.

3. Below we will see how the DISTINCT can be completed when we can not use the index alone:

Sky@localhost:example 11:04:40> EXPLAIN SELECT DISTINCT user_id-> from 
  group_message->
  WHERE Group _id > 1 and group_id < 10G

1. Row ***************************
      id:1
 select_type:simple
    table:group_message
     type:range
POSSIBLE_KEYS:IDX_GID_UID_GC
     key:idx_gid_uid_gc
   key_len:4
     ref:null
     rows:32
    extra:using WHERE; Using index; Using temporary
1 row in Set (0.00 sec)

When MySQL can not rely solely on the index to complete the DISTINCT operation, you have to use the temporary table for the appropriate operation. But we can see that when MySQL uses a temporary table to complete the DISTINCT, it's a little different from dealing with GROUP by, that is, less filesort. In fact, in the MySQL grouping algorithm, it is not necessary to sort to complete the grouping operation, which I have mentioned in the group by optimization tips above. In fact, here MySQL is not sorted in the case of the final completion of the group DISTINCT operations, so less filesort this sort operation.

4. Finally and GROUP by combination try:

Sky@localhost:example 11:05:06> EXPLAIN SELECT DISTINCT max (user_id) 
  -> from Group_message->
  WHERE group_id > 1 and group_id <
  -> GROUP by GROUP_IDG

1. Row ***************************
      id:1
 select_type:simple
    table:group_message
     type:range
POSSIBLE_KEYS:IDX_GID_UID_GC
     key:idx_gid_uid_gc
   key_len:4
     ref:null
     rows:32
    extra:using WHERE; Using index; Using temporary; Using filesort
1 row in Set (0.00 sec)

Finally, let's take a look at this. Using an example with an aggregate function with GROUP by, you can see that there are more filesort sort operations than the third example above, precisely because we use the MAX function. To get the MAX value after grouping, you cannot use the index to complete the operation, only by sorting.

MySQL distinct and group by who better
1. Pre-Test preparation

Prepare a test table 
mysql> CREATE table ' test_test ' ( 
 ->  ' id ' int (one) not NULL auto_increment, 
 ->< c19/> ' num ' int (one) not NULL default ' 0 ', 
 ->  PRIMARY KEY (' id ') 
 ->) Engine=myisam default Charset=ut F8 auto_increment=1;

Query OK, 0 rows affected (0.05 sec)

Mysql> Delimiter | | Change mysql command terminator to | | 
 
Build a stored procedure inserts 10W data into the table 
Mysql> CREATE PROCEDURE p_test (PA int) 
 -> begin->-> declare 
 max _num Int (one) default 100000; 
 -> declare i int default 0; 
 -> declare rand_num int; 
 -> 
 -> Select count (ID) into the max_num from Test_test; 
 -> 
 -> While I < PA do 
 ->     if max_num < 100000 then 
 ->         Select cast (rand () *100 as unsigned) into rand_num; 
 ->         INSERT INTO test_test (num) values (rand_num); 
 -> end     if; 
 ->     Set i = i +1; 
 -> end While; 
 -> end| |

Query OK, 0 rows Affected (0.00 sec)

Mysql> call P_test (100000) | |

Query OK, 1 row affected (5.66 sec)

Mysql> delimiter//change MySQL command terminator; 
Mysql> select COUNT (id) from test_test; The numbers are in.

+-----------+ 
| count (ID) | 
+-----------+ 
|  100000 | 
+-----------+ 
1 row in Set (0.00 sec)

Mysql> Show variables like "%pro%";  Check to see if the record execution of the profiling is not open, the default is not open

+---------------------------+-------+ 
| Variable_name       | Value | 
+---------------------------+-------+ 
| profiling         | Off | | | profiling_history_size | | protocol_version | | | 
slave_ Compressed_protocol | Off  | 
+---------------------------+-------+ 
4 rows in Set (0.00 sec)

Mysql> set profiling=1;      Open

Query OK, 0 rows Affected (0.00 sec)

2, test

4 sets of tests were done 
mysql> select DISTINCT (num) from test_test; 
Mysql> Select num from test_test GROUP by NUM; 
 
Mysql> show Profiles;  View Results

+----------+------------+-------------------------------------------+ 
| query_id | Duration | 
Query |    +----------+------------+-------------------------------------------+ 
| 1 | 0.07298225 | 
SELECT DISTINCT (num) from Test_test |    | 2 | 0.07319975 | 
Select num from test_test GROUP by num |    | 3 | 0.07313525 | 
Select num from test_test GROUP by num |    | 4 | 0.07317725 | 
SELECT DISTINCT (num) from Test_test |    | 5 | 0.07275200 | 
SELECT DISTINCT (num) from Test_test |    | 6 | 0.07298600 | 
Select num from test_test GROUP by num |    | 7 | 0.07500700 | 
Select num from test_test GROUP by num |    | 8 | 0.07331325 | 
SELECT DISTINCT (num) from Test_test |    | 9 | 0.57831575 | Create INDEX Num_index on test_test (num) |    When I was here, I added the index | 10 | 0.00243550 | 
SELECT DISTINCT (num) from Test_test |    | 11 | 0.00121975 | 
Select num from test_test GROUP by num |    | 12 | 0.00116550 | 
SELECT DISTINCT (num) from Test_test |    | 13 | 0.00107650 | 
Select num from test_test GROUP by num | 
 +----------+------------+-------------------------------------------+ rows in Set (0.00 sec)

The 1-8 above is 4 sets of data and is not indexed, from which we can see that distinct is a little bit better than group by
10-13 is 2 sets of data, which is indexed, and we can see that group by is a little bit better than distinct.
Generally, the data is larger than the table, the associated field will be indexed, and indexed after the retrieval time is only about one-sixth of the previous.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More