1. Execution order of SQL query statements
(7) SELECT
(8) DISTINCT <select_list>
(1) FROM <left_table>
(3) <join_type> JOIN <right_table>
(2) ON <join_condition>
(4) WHERE <where_condition>
(5) GROUP BY <group_by_list>
(6) HAVING
(9) ORDER BY <order_by_condition>
(10) LIMIT <limit_number>
The execution of the SQL statements is not actually executed exactly in the order in which we wrote them, or the actual order of execution is not a small deviation from what we expected. Then the true logical order of its query statements, as shown in the order preceding each line, is actually not a select, but a from.
1.1 From Cartesian product
During the execution of an SQL statement, a virtual table, called VT, is actually generated to temporarily save the execution result。 The virtual table generated by each step will be entered as the next step of execution.
The first step is actually executed first from the statement, first query left table or right table, are not, directly first do cartesian product, this Cartesian product of the result set, is called VT1.
1.2 On filter After the Cartesian product is executed, the results are filtered according to the conditions specified in on, removing those non-conforming data, and then getting a new virtual table, VT2.
1.3 Adding an outer row This step is a step that is performed only if the connection type is an outer join, that is, the OUTER join, such as the left OUTER join, the right OUTER join, and the full OUTER join. Most of the time we omit the keyword outer, but you know that injecting a left join is an outer join.
Because the right connection can actually get the same result by reversing the table order with the left connection, we'll do it separately as a left join. We know that the left connection is based on left table, regardless of whether the condition is met, will be left in the final result, in addition to its own table value, the column portion of its join table is assigned a value of NULL. As we have said in the previous step, the on filter is performed and the non-conforming data is removed, which means that the null is worth filtering out.
So this step, if it is a left JOIN, is to put left table in the second step of the filtered portion, re-added to the VT2, forming a VT3. If there are tables in the FROM clause that need to be connected, then VT3 is used as input, repeat step 1-3, and finally get the new virtual table VT3.
The 1.4 where filter filters the VT3 based on the Where condition, outputting the qualifying record to the virtual table VT4.
Attention:because group by is not implemented at this time, categorical statistics such as aggregation functions cannot be used in filtering conditions.
1.5 Group by group by IS used to group the results after the where filter, resulting in VT5.
In this step, although you can perform some aggregation functions for the post-grouping content, in fact the new result set VT5 only contains one row for each group, be aware.
1.6 Having filter having filter is used with group by, the VT5 virtual table obtained by grouping is conditionally filtered, and the virtual table VT6 is obtained.
1.7 select list Although the select is written on the first line, it is not the first one to execute. Select only extracts our selected columns from the virtual table VT6 to form a new virtual table VT7.
1.8 Execution DISTINCT If you specify the DISTINCT clause in the query, a temporary memory table is created (if the memory doesn't fit, it needs to be stored on the hard disk). The table structure of this temporary table is the same as the virtual table VT7 generated in the previous step, except for the addition of duplicate data by adding a unique index to the column that performed the distinct operation.
1.9 order BY sorts the contents of the virtual table, sorts them by the specified columns, and returns a new virtual table VT8 after meals.
1.10 Limit limits Finally, go to the last step, the VT8 virtual table from the specified location to crawl the specified number of data, the final result is returned.
2, a pit directly with examples to illustrate, if there are two tables as follows, students table (class, name) and score table (name, score), now I need to return to class A all the students, but this class has students lack of test, the students we can not miss it, so here we use the Student table as the left table, the left connection. We tried to do this using the following SQL (for the sake of comparison, the columns of both tables are output):
SELECT
stu.name,
stu.class,
s.name,
s.score
FROM
student AS stu LEFT JOIN score AS s ON stu.name = s.name AND stu.class = ‘A‘
At first glance, there seems to be no problem, but also left the connection and on the condition of the increase in class restrictions, in fact, the results appear in Class B classmate, why? It is because of the SQL execution order problem, in step 1-3 we said that the outer join is also to do the Cartesian product, according to the filter on the condition, notice, and then the reservation table in the filtered rows are re-added back.
So actually the result is this: on condition filtered records + reserved records filtered in the table+So, in order to get exactly what we want, the filter for the final class should be placed in the where to perform the following:
SELECT
stu.name,
stu.class,
s.name,
s.score
FROM
student AS stu LEFT JOIN score AS s ON stu.name = s.name
WHERE
stu.class = ‘A‘
3.Reference Links
- SQL logical Query Statement execution order
- statement execution order for SQL and MySQL
Elementary Introduction to SQL Optimization: 1. Execution order of SQL query statements