This article mainly introduces the index optimization distinct statement and the multi-field operation method of distinct in MySQL. the de-duplication function of distinct statements is the basic knowledge in MySQL beginners, for more information, see MySQL. Generally, the GROUPBY (sort action in essence) is used to perform the DISTINCT operation. if the DISTINCT operation and ORDERBY operation are used in combination, temporary tables are usually used. this will affect the performance. in some cases, MySQL can use indexes to optimize the DISTINCT operation, but it needs to be learned and used. this document describes an example of a DISTINCT operation that cannot be performed using an index.
Instance 1 optimize the DISTINCT operation using indexes
create table m11 (a int, b int, c int, d int, primary key(a)) engine=INNODB;insert into m11 values (1,1,1,1),(2,2,2,2),(3,3,3,3),(4,4,4,4),(5,5,5,5),(6,6,6,6),(7,7,7,7),(8,8,8,8);explain select distinct(a) from m11;
mysql> explain select distinct(a) from m11;
The code is as follows:
+ ---- + ------------- + ------- + ------------ + ------- + ------------- + --------- + ------ + ---------- + --------------- + | Id | select_type | table | partitions | type | keys | key_len | ref | rows | filtered | Extra | + ---- + ------------- + ------- + ------------ + ------- + --------------- + --------- + ------ + ---------- + ------------- + | 1 | m11 | NULL | index | PRIMARY | 4 | NULL | 1 | 100.00 | Using index | + ---- + ------------- + ------- + ------------ + ------- + ------------- + --------- + ------ + ------ + ---------- + ------------- +
Note:
One 'A' column has a primary key index. MySQL can use the index (the key column value indicates that the primary key index is used) to complete the DISTINCT operation.
2. this is a typical example of optimizing the DISTINCT operation using indexes.
Instance 2 using indexes cannot optimize the DISTINCT operation
create table m31 (a int, b int, c int, d int, primary key(a)) engine=MEMORY;insert into m31 values (1,1,1,1),(2,2,2,2),(3,3,3,3),(4,4,4,4),(5,5,5,5),(6,6,6,6),(7,7,7,7),(8,8,8,8);explain select distinct(a) from m31;
mysql> explain select distinct(a) from m31;
The code is as follows:
+ ---- + ------------- + ------- + ------------ + ------ + --------------- + ------ + --------- + ------ + ---------- + ------- + | Id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | + ---- + --------------- + ------------ + ------ + --------------- + ------ + --------- + ------ + ---------- + ------- + | 1 | SIMPLE | m31 | NULL | ALL | NULL | 8 | 100.00 | NULL | + ---- + ------------- + ------- + ---------- + ------ + ------------- + ------ + --------- + ------ + ---------- + ------- +
Note:
1. according to the query execution plan, the index is not used.
2. compare the table creation statement of instance 1, except that the storage engine is different.
3 Why does the primary key index not work? Is indexes on the MEMORY storage engine unavailable?
Instance 3 use indexes to optimize Memory tables for DISTINCT operations
create table m33 (a int, b int, c int, d int, INDEX USING BTREE (a)) engine=MEMORY;insert into m33 values (1,1,1,1),(2,2,2,2),(3,3,3,3),(4,4,4,4),(5,5,5,5),(6,6,6,6),(7,7,7,7),(8,8,8,8);explain select distinct(a) from m33;
mysql> explain select distinct(a) from m33;
+ ---- + ------------- + ------- + ------------ + ------- + ------------- + ------ + --------- + ------ + ---------- + ------- + | Id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | + ---- + --------------- + ------- + ------------ + ------- + --------------- + ------ + --------- + ------ + ---------- + ------- + | 1 | SIMPLE | m33 | NULL | index | NULL | a | 5 | NULL | 8 | 100.00 | NULL | + ---- + ------------- + ------- + ------------ + ------- + --------------- + ------ + --------- + ------ + ---------- + ------- +
Note:
One 'A' column has a primary key index. MySQL can use the index (the key column value indicates that the primary key index is used) to complete the DISTINCT operation.
2. compare instance 2, you can find that both use the Memory engine, but instance 3 uses the B-tree index.
3. instance 2 does not specify the type of index to be used. MySQL will use the default value, as described in the MySQL manual:
As indicated by the engine name, MEMORY tables are stored in memory. They use hash indexes by default, which makes them very fast for single-value lookups, and very useful for creating temporary tables.
Conclusion:
1. check the impact of indexes on queries. pay attention to the index type.
2. HASH indexes are suitable for equivalent searches, but are not suitable for scenarios requiring ordering, while B-tree is suitable for ordered scenarios.
3. check the query execution plan and find that the index is not used. you need to further investigate the index type.
Solution to DISTINCT failure to select multiple fields
In practical applications, we often need to select duplicate data from a table in the database. generally, we use the DISTINCT function.
However, DISTINCT is only valid for one field, for example:
sql="select DISTINCT title from Table where id>0"
When we need to list another column in the data, for example:
sql="select DISTINCT title,posttime from Table where id>0"
The result is not what we want, so we need another method to solve this problem.
The following are the SQL statements I have written. I don't know if they are good. I hope you can share them with others:
Statement 1:
sql = "Select DISTINCT(title),posttime From Table1 Where id>0"
Statement 2:
sql = "Select title,posttime From Table1 Where id>0 group by title,posttime"
Statement 3:
sql="select title,posttime from Table where id in (select min(id) from Table group by title)"