Analysis of MySQL's tips for optimizing distinct, mysqldistinct
There is a need: select count (distinct nick) from user_access_xx_xx;
This SQL statement is used to calculate the uv of user access. Because the data volume in a single table is more than 10 Gb, even if the index of nick is added to user_access_xx_xx,
By viewing the execution plan, the full index scan is also performed. During SQL Execution, the whole server is affected by jitter;
root@db 09:00:12>select count(distinct nick) from user_access;+———————-+| count(distinct nick) |+———————-+| 806934 |+———————-+1 row in set (52.78 sec)
It takes 52.78 s to execute an SQL statement, which is already very slow.
Now we need to change the thinking to solve this problem:
We know that the index value is in ascending order of the index field. For example, if we index the two fields (nick, other_column), the index is based on nick, other_column in ascending order:
Our current SQL: select count (distinct nick) from user_access; then it is directly scanned from nick1 until the last nick_n is scanned,
The intermediate process will scan many duplicate nick S. If we can skip the duplicate nick, the performance will be optimized a lot (in oracle, this scanning technology is loose index scan, however, mysql 5.1 does not support this optimization technology directly ):
Therefore, you need to rewrite the SQL statement to achieve the pseudo loose index scan:
root@db 09:41:30>select count(*) from ( select distinct(nick) from user_access)t ;| count(*) |+———-+| 806934 |1 row in set (5.81 sec)
Select different nick in SQL, and set a layer on the outside to get the total distinct value of nick;
The most important thing is that in subqueries: select distinct (nick) implements the pseudo loose index scan. At this time, the optimizer executes the plan as Using index for group-,
It should be noted that mysql optimizes distinct to "group by", which groups data by index first, then scans the index, and only scans the expected nick once;
The execution plans of the two SQL statements are:
Optimized Syntax:
root@db 09:41:10>explain select distinct(nick) from user_access-> ;+—-+————-+——————————+——-+—————+————-| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+—-+————-+——————————+——-+—————+————-| 1 | SIMPLE | user_access | range | NULL | ind_user_access_nick | 67 | NULL | 2124695 | Using index for group-by |+—-+————-+——————————+——-+—————+————-
Original statement:
root@db 09:42:55>explain select count(distinct nick) from user_access;+—-+————-+——————————+——-+—————+————-| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+—-+————-+——————————+——-+—————+————-| 1 | SIMPLE | user_access | index | NULL | ind_user_access | 177 | NULL | 19546123 | Using index |