Step by step: MySQL Architecture Overview-> query execution process-> SQL parsing Sequence
I always want to know how an SQL statement is executed and how it is executed in sequence. Then I can view the summary of all parties and I will have the following blog post. This article will discuss the knowledge in the MySQL General Architecture ---> query execution process ---> statement execution sequence. I. MySQL Architecture Overview: the architecture is best illustrated with necessary instructions. Based on the original figure in the reference book, I added my own understanding on it. We can see that the entire architecture is divided into two layers. The upper Layer is called the 'SQL Layer' of MySQLD, and the lower Layer is a variety of storage engines that provide interfaces, it is called 'Storage Engine Layer '. For other modules and components, you can simply learn about their functions from their names. 2. the query execution process goes further below. Let me explain what the query execution process is like based on my own understanding: 1. connect to the 1.1 client to initiate a Query request, the 'Connection managementmodule' of the listening client receives the request 1.2 and forwards the request to the 'Connection in/thread' Module 1.3. After the 'user module' is called for authorization check 1.4, the 'Connection-in/thread module' extracts idle cached connection lines from the 'thread connection pool 'and accesses client requests. If the connection fails, a new connection request is created. process 2.1 first Query the cache, check whether the Query statement is completely matched, and then check whether there are permissions. If both are successful, the system directly retrieves data and returns 2.2 If the previous step fails, it is transferred to the 'COMMAND resolution ', after lexical analysis and syntax analysis, parse tree 2.3 is generated. The next step is the preprocessing phase. The parser can not solve the semantics and check permissions, generate a new resolution tree 2.4 and then hand it to the corresponding module for processing 2.5 If the SELECT query is performed, a lot of optimization will be done through the 'query optimizer '. After the execution plan 2.6 module receives the request, check through 'access control module' If the connected user has the permission to access the target table and the target field 2.7, call the 'table management module'. First, check whether the table cache exists. If yes, the connected user directly corresponds to the table and obtains the lock, otherwise, open the table file 2.8 again and obtain the table's storage engine type and other information based on the table's meta data. When the corresponding storage engine is called through the interface to process data changes in the above process, if the log function is enabled, it is recorded in the corresponding binary log file. 3. result 3.1after the query request is complete, return the result set to 'connect to/Threads' 3.2 and return the corresponding status ID, for example, if the connection succeeds or fails, 3.3 'connect to/thread module' will be cleaned up later, and continue to wait for the request or disconnect from the client figure small Summary 3. SQL parsing order next step, let's look at the past and present of an SQL statement. First, let's take a look at the example statement.
SELECT DISTINCT < select_list >FROM < left_table > < join_type >JOIN < right_table > ON < join_condition >WHERE < where_condition >GROUP BY < group_by_list >HAVING < having_condition >ORDER BY < order_by_condition >LIMIT < limit_number >
However, the execution sequence is as follows:
1 FROM <left_table> 2 ON <join_condition> 3 <join_type> JOIN <right_table> 4 WHERE <where_condition> 5 GROUP BY <group_by_list> 6 HAVING Although I did not expect it to be like this, it is still quite natural and harmonious at first glance. Where can I get it? The filtering conditions should be the same or different, and the order should be sorted, the first few items are required. Now, let's look at the details step by step. Preparation 1. create database testQuery 2. create a test table
CREATE TABLE table1( uid VARCHAR(10) NOT NULL, name VARCHAR(10) NOT NULL, PRIMARY KEY(uid))ENGINE=INNODB DEFAULT CHARSET=UTF8;CREATE TABLE table2( oid INT NOT NULL auto_increment, uid VARCHAR(10), PRIMARY KEY(oid))ENGINE=INNODB DEFAULT CHARSET=UTF8;
3. insert data
Insert into table1 (uid, name) VALUES ('aaa', 'Mike '), ('bbb', 'jack'), ('ccc ', 'Mike '), ('ddd ', 'Mike'); insert into table2 (uid) VALUES ('aaa'), ('aaa'), ('bbb '), ('bbb '), ('bbb'), ('ccc '), (NULL); 4. final Result
SELECT a.uid, count(b.oid) AS totalFROM table1 AS aLEFT JOIN table2 AS b ON a.uid = b.uidWHERE a. NAME = 'mike'GROUP BY a.uidHAVING count(b.oid) < 2ORDER BY total DESCLIMIT 1;
! Start SQL parsing now! 1. When FROM involves multiple tables, the output of the Left table is used as the input of the right table, and a virtual table VT1 is generated. (1-J1) Cartesian Product calculates the Cartesian Product (cross join) of two associated tables to generate a virtual table VT1-J1.
mysql> select * from table1,table2;+-----+------+-----+------+| uid | name | oid | uid |+-----+------+-----+------+| aaa | mike | 1 | aaa || bbb | jack | 1 | aaa || ccc | mike | 1 | aaa || ddd | mike | 1 | aaa || aaa | mike | 2 | aaa || bbb | jack | 2 | aaa || ccc | mike | 2 | aaa || ddd | mike | 2 | aaa || aaa | mike | 3 | bbb || bbb | jack | 3 | bbb || ccc | mike | 3 | bbb || ddd | mike | 3 | bbb || aaa | mike | 4 | bbb || bbb | jack | 4 | bbb || ccc | mike | 4 | bbb || ddd | mike | 4 | bbb || aaa | mike | 5 | bbb || bbb | jack | 5 | bbb || ccc | mike | 5 | bbb || ddd | mike | 5 | bbb || aaa | mike | 6 | ccc || bbb | jack | 6 | ccc || ccc | mike | 6 | ccc || ddd | mike | 6 | ccc || aaa | mike | 7 | NULL || bbb | jack | 7 | NULL || ccc | mike | 7 | NULL || ddd | mike | 7 | NULL |+-----+------+-----+------+28 rows in set (0.00 sec)
(1-J2) ON filtering is based ON the virtual table VT1-J1. This virtual table filters out all columns that meet the ON predicate conditions and generates the virtual table VT1-J2. Note: Because of syntax restrictions, 'where' is used to replace the two, and the readers can also feel the subtle relationship between the two;
mysql> SELECT -> * -> FROM -> table1, -> table2 -> WHERE -> table1.uid = table2.uid -> ;+-----+------+-----+------+| uid | name | oid | uid |+-----+------+-----+------+| aaa | mike | 1 | aaa || aaa | mike | 2 | aaa || bbb | jack | 3 | bbb || bbb | jack | 4 | bbb || bbb | jack | 5 | bbb || ccc | mike | 6 | ccc |+-----+------+-----+------+6 rows in set (0.00 sec)
(1-J3) add external columns if an external join (LEFT, RIGHT, FULL) is used, columns that do not meet the ON condition in the primary table (reserved table) will also be added to the VT1-J2, generates a virtual table VT1-J3 as an external row.
mysql> SELECT -> * -> FROM -> table1 AS a -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid;+-----+------+------+------+| uid | name | oid | uid |+-----+------+------+------+| aaa | mike | 1 | aaa || aaa | mike | 2 | aaa || bbb | jack | 3 | bbb || bbb | jack | 4 | bbb || bbb | jack | 5 | bbb || ccc | mike | 6 | ccc || ddd | mike | NULL | NULL |+-----+------+------+------+7 rows in set (0.00 sec)
The following figure shows an image of 'SQL jobs' on the Internet. If your rights are violated, please let us know how to delete it. Thank you. 2. The WHERE clause filters the temporary tables generated during the VT1 process. columns that meet the WHERE clause are inserted into the VT2 table. Note: In this case, aggregation cannot be used because of grouping, and aliases created in SELECT cannot be used. Differences from ON: if an external Column exists, ON filters associated tables, the primary table (reserved table) returns all columns. If no external columns are added, the results are the same. Application: filter the primary table in the WHERE clause; for associated tables, use ON after conditional query, and WHERE for conditional query;
mysql> SELECT -> * -> FROM -> table1 AS a -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid -> WHERE -> a. NAME = 'mike';+-----+------+------+------+| uid | name | oid | uid |+-----+------+------+------+| aaa | mike | 1 | aaa || aaa | mike | 2 | aaa || ccc | mike | 6 | ccc || ddd | mike | NULL | NULL |+-----+------+------+------+4 rows in set (0.00 sec)
3. The group by clause groups the tables generated in VT2 BY columns in group. Generate a VT3 table. Note: the statements in the post-processing process, such as SELECT and HAVING, must contain columns in group by. For those that do not exist, an aggregate function is used. The reason is: group by changes the table reference and converts it to a new reference method. The columns that can perform the next-level logical operation on the table will be reduced. My understanding is: According to the grouping field, merge Records with the same group field into one record, because only one record can be returned for each group, unless it is filtered out, and fields not in the group field may have multiple values, multiple values cannot be put into one record. Therefore, you must use an aggregate function to convert these columns with multiple values into single values;
mysql> SELECT -> * -> FROM -> table1 AS a -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid -> WHERE -> a. NAME = 'mike' -> GROUP BY -> a.uid;+-----+------+------+------+| uid | name | oid | uid |+-----+------+------+------+| aaa | mike | 1 | aaa || ccc | mike | 6 | ccc || ddd | mike | NULL | NULL |+-----+------+------+------+3 rows in set (0.00 sec)
4. The HAVING clause filters different groups in the VT3 table and only applies to the grouped data. clauses that meet the HAVING condition are added to the VT4 table.
mysql> SELECT -> * -> FROM -> table1 AS a -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid -> WHERE -> a. NAME = 'mike' -> GROUP BY -> a.uid -> HAVING -> count(b.oid) < 2;+-----+------+------+------+| uid | name | oid | uid |+-----+------+------+------+| ccc | mike | 6 | ccc || ddd | mike | NULL | NULL |+-----+------+------+------+2 rows in set (0.00 sec)
5. The SELECT clause processes the elements in the SELECT clause and generates a VT5 table. (5-J1) Calculate the expression in the SELECT clause, generate the VT5-J1 (5-J2) DISTINCT look for duplicate columns in the VT5-1, and delete, generate the VT5-J2 if the DISTINCT clause is specified in the query, A temporary memory table will be created (if the memory cannot be stored, it needs to be stored on the hard disk ). The table structure of this temporary table is the same as that of the virtual table VT5 generated in the previous step. The difference is that a unique index is added to the column for the DISTINCT operation to remove duplicate data.
mysql> SELECT -> a.uid, -> count(b.oid) AS total -> FROM -> table1 AS a -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid -> WHERE -> a. NAME = 'mike' -> GROUP BY -> a.uid -> HAVING -> count(b.oid) < 2;+-----+-------+| uid | total |+-----+-------+| ccc | 1 || ddd | 0 |+-----+-------+2 rows in set (0.00 sec)
6. order by from the table in the VT5-J2, according to the conditions of the order by clause to sort the results, generate VT6 table. Note: the only place where the alias in SELECT can be used;
mysql> SELECT -> a.uid, -> count(b.oid) AS total -> FROM -> table1 AS a -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid -> WHERE -> a. NAME = 'mike' -> GROUP BY -> a.uid -> HAVING -> count(b.oid) < 2 -> ORDER BY -> total DESC;+-----+-------+| uid | total |+-----+-------+| ccc | 1 || ddd | 0 |+-----+-------+2 rows in set (0.00 sec)
7. The LIMIT clause selects the specified row data starting from the specified position from the VT6 virtual table obtained in the previous step. Note: the positive and negative effects of offset and rows; the efficiency is very low when the offset is large, you can do this: Use the subquery method to optimize, in the subquery, the maximum id is obtained from the index first, and then in reverse order, and N rows of result sets are obtained using inner join optimization. In the JOIN clause, the ID list is obtained from the index first, then directly associate the query to obtain the final result.
mysql> SELECT -> a.uid, -> count(b.oid) AS total -> FROM -> table1 AS a -> LEFT JOIN table2 AS b ON a.uid = b.uid -> WHERE -> a. NAME = 'mike' -> GROUP BY -> a.uid -> HAVING -> count(b.oid) < 2 -> ORDER BY -> total DESC -> LIMIT 1;+-----+-------+| uid | total |+-----+-------+| ccc | 1 |+-----+-------+1 row in set (0.00 sec)
So far, the SQL parsing journey is over. To sum up: