The bibliography used in the user verification is recorded in the datasheet, and now I want to take out all the bibliographies, using distinct and group by to get the results I want, but I find that the return results are arranged in a different order, and the distinct will be displayed in the sequence of data, and group By Will do a sort (typically ASC).
DISTINCT is actually very similar to the implementation of a group by operation, except that only one record is taken out of each group by after the groups by. Therefore, the implementation of the DISTINCT and GROUP by implementation is almost the same, there is not much difference, the same can be done through a loose index scan or a compact index scan.
which distinct and group by which efficiency is higher?
The distinct operation only needs to find out all the different values. The group by operation also prepares for other aggregate functions. From this point on, the GROUP by operation should do more work than distinct.
But in fact, GROUP by efficiency will be higher, why? For the distinct operation, it reads all the records, and group by needs to read as many records as there are groups, which means a lot less than the actual number of records.
Example AA Table A B
123 10
123 12
1234 11
1234 14
First group is used to group, not to filter duplicates. Duplicate DELETE statement distinct use this. Select DISTINCT (a) from AA
The result is a.
123
1234
Group BY for grouping
Select a, sum (b) from AA GROUP by a
Sum means the sum. The result is
A b
123 22
1234 25
The purpose of the statement is to use a as a target to know the number of items in column B that have the same name in total
Select A,count (b) from AA GROUP by a
Count meaning line sum result is
A b
123 2
1234 2
The object of the statement is a few lines with the same name
Distinct and group by performance comparisons in MySQL
Test process:
Prepare a test sheet
CREATE TABLE ' Test_test ' (
' id ' int (one) not NULL auto_increment,
' num ' int (one) not NULL default ' 0 ',
PRIMARY KEY (' id ')
) Engine=myisam DEFAULT Charset=utf8 auto_increment=1;
Create a stored procedure to insert 10W data into a table
CREATE PROCEDURE p_test (PA int (11))
Begin
DECLARE max_num int (one) default 100000;
declare i int default 0;
declare rand_num int;
Select COUNT (ID) into the max_num from Test_test;
While I < PA do
If Max_num < 100000 then
Select CAST (rand () *100 as unsigned) into rand_num;
INSERT into test_test (num) values (rand_num);
End If;
Set i = i +1;
End while;
End
Call stored Procedure Insert data
1 call P_test (100000);
Start test: (without index)
SELECT distinct num from test_test;
Select num from test_test GROUP by NUM;
[SQL] SELECT distinct num from test_test;
Affected rows: 0
Time: 0.078ms
Select num from test_test GROUP by NUM;
Affected rows: 0
Time: 0.031ms
Two, the NUM field to create the index
ALTER TABLE ' test_test ' ADD INDEX ' num_index ' (' num ');
Query again
SELECT distinct num from test_test;
Select num from test_test GROUP by NUM;
[SQL] SELECT distinct num from test_test;
Affected rows: 0
Time: 0.000ms
Select num from test_test GROUP by NUM;
Affected rows: 0
Time: 0.000ms
At this point we found that the time is too small for 0.000 seconds can not be accurate.
We go to the command line to test
Mysql> set profiling=1;
Mysql> SELECT DISTINCT (num) from Test_test;
Mysql> Select num from test_test GROUP by NUM;
Mysql> Show Profiles;
+----------+------------+----------------------------------------+
| query_id | Duration | Query |
+----------+------------+----------------------------------------+
| 1 | 0.00072550 | SELECT DISTINCT (num) from Test_test |
| 2 | 0.00071650 | Select num from test_test GROUP by num |
+----------+------------+----------------------------------------+
After indexing, distinct is 107 times times faster than the distinct without index.
GROUP BY IS 43 times times faster than group by without index after indexing.
Compare: Distinct and GROUP by
No index group BY is faster than distinct. Therefore, it is recommended to select group by when using