MySQL under distinct and group by difference contrast

Source: Internet
Author: User

The bibliography used in the user verification is recorded in the datasheet, and now I want to take out all the bibliographies, using distinct and group by to get the results I want, but I find that the return results are arranged in a different order, and the distinct will be displayed in the sequence of data, and group By Will do a sort (typically ASC).

DISTINCT is actually very similar to the implementation of a group by operation, except that only one record is taken out of each group by after the groups by. Therefore, the implementation of the DISTINCT and GROUP by implementation is almost the same, there is not much difference, the same can be done through a loose index scan or a compact index scan.

which distinct and group by which efficiency is higher?

The distinct operation only needs to find out all the different values. The group by operation also prepares for other aggregate functions. From this point on, the GROUP by operation should do more work than distinct.
But in fact, GROUP by efficiency will be higher, why? For the distinct operation, it reads all the records, and group by needs to read as many records as there are groups, which means a lot less than the actual number of records.

Example AA Table A B
123 10
123 12
1234 11
1234 14
First group is used to group, not to filter duplicates. Duplicate DELETE statement distinct use this. Select DISTINCT (a) from AA
The result is a.
123
1234

Group BY for grouping

Select a, sum (b) from AA GROUP by a

Sum means the sum. The result is
A b
123 22
1234 25
The purpose of the statement is to use a as a target to know the number of items in column B that have the same name in total

Select A,count (b) from AA GROUP by a

Count meaning line sum result is
A b
123 2
1234 2

The object of the statement is a few lines with the same name

Distinct and group by performance comparisons in MySQL

Test process:

Prepare a test sheet

CREATE TABLE ' Test_test ' (
' id ' int (one) not NULL auto_increment,
' num ' int (one) not NULL default ' 0 ',
PRIMARY KEY (' id ')
) Engine=myisam DEFAULT Charset=utf8 auto_increment=1;

Create a stored procedure to insert 10W data into a table

CREATE PROCEDURE p_test (PA int (11))
Begin

DECLARE max_num int (one) default 100000;
declare i int default 0;
declare rand_num int;

Select COUNT (ID) into the max_num from Test_test;

While I < PA do
If Max_num < 100000 then
Select CAST (rand () *100 as unsigned) into rand_num;
INSERT into test_test (num) values (rand_num);
End If;
Set i = i +1;
End while;
End

Call stored Procedure Insert data

1 call P_test (100000);

Start test: (without index)

SELECT distinct num from test_test;
Select num from test_test GROUP by NUM;

[SQL] SELECT distinct num from test_test;
Affected rows: 0
Time: 0.078ms

Select num from test_test GROUP by NUM;
Affected rows: 0
Time: 0.031ms

Two, the NUM field to create the index

ALTER TABLE ' test_test ' ADD INDEX ' num_index ' (' num ');

Query again

SELECT distinct num from test_test;
Select num from test_test GROUP by NUM;
[SQL] SELECT distinct num from test_test;
Affected rows: 0
Time: 0.000ms

Select num from test_test GROUP by NUM;
Affected rows: 0
Time: 0.000ms


At this point we found that the time is too small for 0.000 seconds can not be accurate.

We go to the command line to test

Mysql> set profiling=1;
Mysql> SELECT DISTINCT (num) from Test_test;
Mysql> Select num from test_test GROUP by NUM;
Mysql> Show Profiles;
+----------+------------+----------------------------------------+
| query_id | Duration | Query |
+----------+------------+----------------------------------------+
| 1 | 0.00072550 | SELECT DISTINCT (num) from Test_test |
| 2 | 0.00071650 | Select num from test_test GROUP by num |
+----------+------------+----------------------------------------+


After indexing, distinct is 107 times times faster than the distinct without index.

GROUP BY IS 43 times times faster than group by without index after indexing.

Compare: Distinct and GROUP by

No index group BY is faster than distinct. Therefore, it is recommended to select group by when using

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.