Problem
You can easily find problem SQL by "show the full processlist" statement as follows:
SELECT post.* from
post
INNER JOIN post_tag on post.id = post_tag.post_id
WHERE post.status = 1 and post_tag.t AG_ID = 123 ORDER by
post.created DESC
LIMIT 100
Note: Because post and tag are many-to-many relationships, there is a relational table Post_tag.
Try using explain to query the SQL Execution plan (limited in length, with the result being truncated):
+----------+---------+-------+-----------------------------+
| Table key | Extra |
+----------+---------+-------+-----------------------------+
| post_tag | tag_id | 71220 | The Using where; Using Filesort |
| Post | PRIMARY | 1 | Using where |
+----------+---------+-------+-----------------------------+
After the optimized SQL is given, the only change is the way the connection is changed to "straight_join":
SELECT post.* from
post
straight_join post_tag on post.id = post_tag.post_id
WHERE post.status = 1 and Post_ta G.TAG_ID = 123 ORDER by
post.created DESC
LIMIT 100
Try using explain to query the SQL Execution plan (limited in length, with the result being truncated):
+----------+----------------+--------+-------------+
| Table key | Extra |
+----------+----------------+--------+-------------+
| post | status_created | 119340 | Using where |
| Post_tag | post_id | 1 | Using where |
+----------+----------------+--------+-------------+
Compared with the results of two times before and after the optimization of the explain, the optimized SQL, although the "rows" is larger, but no "using filesort", overall, performance is still improved.
explain
Why did the MySQL optimizer choose a time-consuming execution scenario for the first SQL? For the second SQL, why did you improve the performance after changing the connection mode to Straight_join?
It all has to do with the way MySQL handles multiple table joins, the first thing to determine is who drives the table, that is, which table is the benchmark, and when dealing with this type of problem, the MySQL optimizer uses a simple, rough workaround: Which table has a small result set and which table is the driving table, which is usually the best choice.
Description: In the explain result, the table that appears in the first row is the driver table.
To continue the Post connection Post_tag example, the MySQL optimizer has the following two choices, respectively:
- With post as the driving table, filtered by status_created index, result set 119340 rows
- With Post_tag as the driving table, filtered by tag_id index, result set 71220 rows
Obviously, the result set of Post_tag filter is smaller, so the MySQL optimizer chooses it as the driving table, sadly, we also need to sort by the created field in the Post table, which means that the sort field is not in the driver sheet, so the inevitable appearance of the "using filesort", which results in a slow query.
Know the ins and outs, optimization is easy. The first thing is to make sure that the sort field is in the driver table, so the post is the driver, so "straight_join" is the answer and it enforces the connection order.
...
But I always feel that "straight_join" this non-standard grammar belongs to the category of Chine, can not be avoided as far as possible, after all, the MySQL optimizer can make the right choice.