1. Create a new test database testdb;
2. Create test tables table1 and table2;
3. Inserting test data
4. Effects
5. Preparing SQL logical Query Test statements
Remember the long list of SQL logic query rules given above? So, which one executes first, which one executes? Now, let me give you the order of execution of a query statement:
(7) SELECT (8) DISTINCT <select_list> (1) from <left_table> (3) <join_type> join <right_ Table> (2) on <join_condition> (4) WHERE <where_condition> (5) GROUP by <group_by_list > (6) has
The execution sequence number is indicated in front of each statement, so don't ask me how I know the order. I also read a variety of "martial arts Cheats" only learned. If you have the time to read MySQL's source code, you will get the result.
Four. SQL Execution sequence analysisFocus:
During the execution of these SQL statements, a virtual table is generated to hold the result of the SQL statement (which is the focus), and we are now going to track the changes of this virtual table and get the final query results to analyze the execution order and process of the entire SQL logical query.
1. Execute FROM statement
The first step is to execute the FROM
statement. We first need to know which table to start with, and that's what FROM
we're told. Now that <left_table>
we have <right_table>
two tables, which table do we start with, or do we start with some sort of connection from the two tables? How do they relate to each other? --Descartes Product
After the FROM statement performs a Cartesian product on two tables, a virtual table is called VT1 (vitual table 1), which reads as follows:
+-------------+----------+----------+-------------+| customer_id | City | order_id | customer_id |+-------------+----------+----------+-------------+| 163 | Hangzhou | 1 | 163 | | 9you | Shanghai | 1 | 163 | | Baidu | Hangzhou | 1 | 163 | | TX | Hangzhou | 1 | 163 | | 163 | Hangzhou | 2 | 163 | | 9you | Shanghai | 2 | 163 | | Baidu | Hangzhou | 2 | 163 | | TX | Hangzhou | 2 | 163 | | 163 | Hangzhou | 3 | 9you | | 9you | Shanghai | 3 | 9you | | Baidu | Hangzhou | 3 | 9you | | TX | Hangzhou | 3 | 9you | | 163 | Hangzhou | 4 | 9you | | 9you | Shanghai | 4 | 9you | | Baidu | Hangzhou | 4 | 9you | | TX | Hangzhou | 4 | 9you | | 163 | Hangzhou | 5 | 9you | | 9you | Shanghai | 5 | 9you | | Baidu | Hangzhou | 5 | 9you | | TX | Hangzhou | 5 | 9you | | 163 | Hangzhou | 6 | TX | | 9you | Shanghai | 6 | TX | | Baidu | Hangzhou | 6 | TX | | TX | Hangzhou | 6 | TX | | 163 | Hangzhou | 7 | NULL | | 9you | Shanghai | 7 | NULL | | Baidu | Hangzhou | 7 | NULL | | TX | Hangzhou | 7 | NULL |+-------------+----------+----------+-------------+
There are a total of (table1) records (number of record strips * table2 records). This is the result of VT1, and the next operation is based on the VT1.
2. Perform on filtering
After the completion of the Cartesian product, then the ON a.customer_id = b.customer_id
conditional filtering, according to the ON
conditions specified in the removal of those non-conforming data, to obtain the VT2 table, the contents are as follows:
+-------------+----------+----------+-------------+| customer_id | City | order_id | customer_id |+-------------+----------+----------+-------------+| 163 | hangzhou | 1 | 163 | | 163 | hangzhou | 2 | 163 | | 9you | shanghai | 3 | 9you | | 9you | shanghai | 4 | 9you | | 9you | shanghai | 5 | 9you | | tx | hangzhou | 6 | Tx |+-------------+----------+----------+-------------+
T2 is the ON
useful data obtained after conditional filtering, and the next operation will continue on the basis of VT2.
3. Add an external row
This step occurs only when the connection type is, OUTER JOIN
such as LEFT OUTER JOIN
, RIGHT OUTER JOIN
. Most of the time, we omit the OUTER
keyword, but that's the concept of the OUTER
outer line.
LEFT OUTER JOIN
The left memento is the reserved table and the result is:
+-------------+----------+----------+-------------+| customer_id | City | order_id | customer_id |+-------------+----------+----------+-------------+| 163 | hangzhou | 1 | 163 | | 163 | hangzhou | 2 | 163 | | 9you | shanghai | 3 | 9you | | 9you | shanghai | 4 | 9you | | 9you | shanghai | 5 | 9you | | tx | hangzhou | 6 | TX | | Baidu | hangzhou | NULL | NULL |+-------------+----------+----------+-------------+
RIGHT OUTER JOIN
The right memento is the reserved table and the results are as follows:
+-------------+----------+----------+-------------+| customer_id | City | order_id | customer_id |+-------------+----------+----------+-------------+| 163 | hangzhou | 1 | 163 | | 163 | hangzhou | 2 | 163 | | 9you | shanghai | 3 | 9you | | 9you | shanghai | 4 | 9you | | 9you | shanghai | 5 | 9you | | tx | hangzhou | 6 | TX | | NULL | NULL | 7 | NULL |+-------------+----------+----------+-------------+
The job of adding an outer row is to add the data that is filtered by the filter in the reserved table based on the VT2 table, the data in the non-reserved table is given a null value, and the virtual table VT3 is generated.
As I used in the prepared test SQL query Logic statement LEFT JOIN
, the following data was filtered out:
| Baidu | hangzhou | NULL | NULL |
Now add this data to the VT2 table and get the VT3 table as follows:
+-------------+----------+----------+-------------+| customer_id | City | order_id | customer_id |+-------------+----------+----------+-------------+| 163 | hangzhou | 1 | 163 | | 163 | hangzhou | 2 | 163 | | 9you | shanghai | 3 | 9you | | 9you | shanghai | 4 | 9you | | 9you | shanghai | 5 | 9you | | tx | hangzhou | 6 | TX | | Baidu | hangzhou | NULL | NULL |+-------------+----------+----------+-------------+
The next operation will be done on the VT3 table.
4. Execute where filter
Where filters are made for VT3 that add external rows, only records that match <where_condition> are exported to the virtual table VT4. When we do WHERE a.city = ‘hangzhou‘
, we get the following, and there are virtual table VT4:
+-------------+----------+----------+-------------+| customer_id | City | order_id | customer_id |+-------------+----------+----------+-------------+| 163 | hangzhou | 1 | 163 | | 163 | hangzhou | 2 | 163 | | tx | hangzhou | 6 | TX | | Baidu | hangzhou | NULL | NULL |+-------------+----------+----------+-------------+
However, when using the WHERE clause, the following two points need to be noted:
- Since the data is not grouped yet, it is not possible to use
where_condition=MIN(col)
this type of filtering for grouping statistics in the where filter.
- Because there is no selection operation for the column, the alias for using the column in select is not allowed, such as:
SELECT city as c FROM t WHERE c=‘shanghai‘;
is not allowed to occur.
5. Performing GROUP BY groupings
GROU BY
The main clause is to WHERE
group the virtual tables that are obtained by using clauses. When we execute the test statement GROUP BY a.customer_id
, we get the following:
+-------------+----------+----------+-------------+| customer_id | City | order_id | customer_id |+-------------+----------+----------+-------------+| 163 | hangzhou | 1 | 163 | | Baidu | hangzhou | NULL | NULL | | tx | hangzhou | 6 | Tx |+-------------+----------+----------+-------------+
The resulting content is stored in the virtual table VT5, at which point we get a VT5 virtual table, and the next operation is done on that table.
6. Performing a having filter
HAVING
Clauses are GROUP BY
used in conjunction with clauses to conditionally filter the VT5 virtual tables that are grouped. When I execute the test statement HAVING count(b.order_id) < 2
, I get the following:
+-------------+----------+----------+-------------+| customer_id | City | order_id | customer_id |+-------------+----------+----------+-------------+| Baidu | hangzhou | NULL | NULL | | tx | hangzhou | 6 | Tx |+-------------+----------+----------+-------------+
This is the virtual table VT6.
7.SELECT List
The clause will not be executed until the SELECT
SELECT
clause is written in the first line, which is the first one to be executed.
We execute the test statement SELECT a.customer_id, COUNT(b.order_id) as total_orders
and choose what we need from the virtual table VT6. We will get the following content:
+-------------+--------------+| customer_id | Total_orders |+-------------+--------------+| Baidu | 0 | | TX | 1 |+-------------+--------------+
No, it's not finished, it's just a virtual table VT7.
8. Execute the DISTINCT clause
If a clause is specified in the query DISTINCT
, a temporary memory table is created (if the memory does not fit, it needs to be stored on the hard disk). The table structure of this temporary table is the same as the virtual table VT7 generated in the previous step, except for the addition of duplicate data by adding a unique index to the column that performed the distinct operation.
Because distinct is not used in my test SQL statement, this step does not generate a virtual table in this query.
9. Execute the ORDER BY clause
The contents of the virtual table are sorted according to the specified column, and then a new virtual table is returned, and when we execute the test SQL statement ORDER BY total_orders DESC
, we get the following:
+-------------+--------------+| customer_id | Total_orders |+-------------+--------------+| TX | 1 | | Baidu | 0 |+-------------+--------------+
You can see that this is the Total_orders column in descending order. The above results are stored in the VT8.
10. Execute the LIMIT clause
LIMIT
Clause selects the specified row data from the specified position from the VT8 virtual table obtained in the previous step. For a limit clause that does not have an order by applied, the result is also unordered, so many times we will see that the limit clause is used with the ORDER BY clause.
The MySQL database limit supports the following forms of selection:
LIMIT N, M
Indicates that the M record is selected starting with the nth record. And many developers like to use this statement to solve paging problems. For small data, using the LIMIT clause has no problem, and when the amount of data is very large, the use LIMIT n, m
is very inefficient. Because the mechanism of limit is to scan from scratch every time, if need to start from the No. 600000 line, read 3 data, you need to scan to locate 600,000 rows before reading, and the scanning process is a very inefficient process. Therefore, for large data processing, it is very necessary to establish a certain caching mechanism in the application layer (seemingly the big data processing, there is a cache oh).
MySQL Fourth--sql logical query statement execution order