1. LIMIT statement
Paging query is one of the most commonly used scenarios, but it is also usually the most problematic place. For example, for the following simple statement, the general DBA thinks of is to add a composite index on the type, name, and create_time fields. In this way, conditional sorting can effectively use the index, and the performance is rapidly improved.
SELECT *
FROM operation
WHERE type ='SQLStats'
AND name ='SlowLog'
ORDER BY create_time
LIMIT 1000, 10;
Well, maybe more than 90% of DBAs solve the problem and stop here. But when the LIMIT clause becomes "LIMIT 1000000,10", the programmer will still complain: Why is it slow that I only fetch 10 records?
To know that the database does not know where the 1,000,000th record starts, even if there is an index, it needs to be calculated from the beginning. When this kind of performance problem occurs, programmers are lazy in most cases.
In scenarios such as front-end data browsing, page turning, or big data batch export, you can use the maximum value of the previous page as a parameter as the query condition. SQL is redesigned as follows:
SELECT *
FROM operation
WHERE type ='SQLStats'
AND name ='SlowLog'
AND create_time> '2017-03-16 14:00:00'
ORDER BY create_time limit 10;
Under the new design, the query time is basically fixed and will not change as the amount of data increases.
2. Implicit conversion
The mismatch between query variables and field definition types in SQL statements is another common mistake. For example, the following statement:
mysql> explain extended SELECT *
> FROM my_balance b
> WHERE b.bpn = 14000000123
> AND b.isverified IS NULL;
mysql> show warnings;
| Warning | 1739 | Cannot use ref access on index'bpn' due to type or collation conversion on field'bpn'
The field bpn is defined as varchar(20), and MySQL’s strategy is to convert strings to numbers before comparing them. The function acts on the table field, the index is invalid.
The above situation may be parameters automatically filled in by the application framework, rather than the programmer's original intention. There are many application frameworks nowadays, which are very complicated and easy to use, but also be careful that they may dig holes for yourself.
3. Association update and delete
Although MySQL 5.6 introduces materialization features, you need to pay special attention to it currently only for query optimization. For update or delete, it needs to be manually rewritten into JOIN.
For example, in the following UPDATE statement, MySQL actually executes a loop/nested subquery (DEPENDENT SUBQUERY), and the execution time can be imagined.
UPDATE operation o
SET status ='applying'
WHERE o.id IN (SELECT id
FROM (SELECT o.id,
o.status
FROM operation o
WHERE o.group = 123
AND o.status NOT IN ('done')
ORDER BY o.parent,
o.id
LIMIT 1) t);
Implementation plan:
+----+--------------------+-------+-------+------- --------+---------+---------+-------+------+------ -----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+------- --------+---------+---------+-------+------+------ -----------------------------------------------+
| 1 | PRIMARY | o | index | | PRIMARY | 8 | | 24 | Using where; Using temporary |
| 2 | DEPENDENT SUBQUERY | | | | | | | | Impossible WHERE noticed after reading const tables |
| 3 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; Using filesort |
+----+--------------------+-------+-------+------- --------+---------+---------+-------+------+------ -----------------------------------------------+
After rewriting to JOIN, the selection mode of the subquery changes from DEPENDENT SUBQUERY to DERIVED, and the execution speed is greatly accelerated, from 7 seconds to 2 milliseconds.
UPDATE operation o
JOIN (SELECT o.id,
o.status
FROM operation o
WHERE o.group = 123
AND o.status NOT IN ('done')
ORDER BY o.parent,
o.id
LIMIT 1) t
ON o.id = t.id
SET status ='applying'
The execution plan is simplified to:
+----+-------------+-------+------+--------------- +-------+---------+-------+------+---------------- -------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+--------------- +-------+---------+-------+------+---------------- -------------------------------------+
| 1 | PRIMARY | | | | | | | | | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; Using filesort |
+----+-------------+-------+------+--------------- +-------+---------+-------+------+---------------- -------------------------------------+
to sum up
The database compiler generates an execution plan, which determines the actual execution mode of
SQL. However, the compiler is only doing its best, and all database compilers are not perfect.
Most of the scenarios mentioned above also have performance problems in other databases. Understanding the characteristics of the database compiler can avoid its shortcomings and write high-performance SQL statements.
When designing data models and writing SQL statements, programmers must bring in algorithm ideas or consciousness.
To write complex SQL statements, develop the habit of using WITH statements. Concise and clear SQL statements can also reduce the burden on the database.