SQL parsing sequence

SQL parsing sequence _ MySQL

Last Update:2018-04-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I always want to know how an SQL statement is executed and how it is executed in sequence. then I can view the summary of all parties and I will have the following blog post. This article will discuss the knowledge of the MySQL general architecture-query execution process-statement execution sequence. I. preface:

I always wanted to know how an SQL statement was executed and how it was executed in sequence. then I checked the summary of all parties and I had the following blog post.

This article will discuss the knowledge in the MySQL general architecture ---> query execution process ---> statement execution sequence.

I. MySQL architecture overview:

The architecture is best illustrated with necessary explanatory text.

Based on the original figure in the reference book, I added my own understanding on it.

We can see that the entire architecture is divided into two layers. the upper Layer is called the 'SQL layer' of MySQLD, and the lower Layer is a variety of storage engines that provide interfaces, it is called 'Storage Engine layer '. For other modules and components, you can simply learn about their functions from their names.

II. query execution process

Next, let's move forward. Based on my own understanding, let me talk about the query execution process:

1. connection

1.1 The client initiates a Query request and listens to the 'connection management module' of the client to receive the request

1.2 forward requests to 'connect to/thread module'

1.3 call the 'user module' for authorization check

1.4 After the check, the 'connection in/thread module 'extracts idle cached connection lines from the 'thread connection pool' and connects them to client requests. if the check fails, a new connection request is created.

2. processing

2.1 First Query the cache, check whether the Query statement fully matches, and then check whether the Query statement has the permission. If all statements are successful, the data is directly retrieved and returned.

2.2 if the previous step fails, it is transferred to the 'command resolution'. after lexical analysis and syntax analysis, the parsing tree is generated.

2.3 The next step is the pre-processing phase. it processes the semantics that the parser cannot solve, checks permissions, and so on, and generates a new parsing tree.

2.4 then handed over to the corresponding module for processing

2.5 for SELECT queries, a large number of optimizations will be made through the 'query Optimizer 'to generate an execution plan.

2.6 after receiving the request, the module checks whether the connected user has the permission to access the target table and target field through the 'Access control module '.

2.7 call the 'table management module'. First, check whether the table cache exists. If yes, the corresponding table is directly used to obtain the lock. otherwise, open the table file again.

2.8 obtain the storage engine type and other information of a table based on the table's meta data, and call the corresponding storage engine for processing through the interface

2.9 if the log function is enabled when data changes are generated during the preceding process, the log is recorded in the corresponding binary log file.

3. Results

3.1after the query request is complete, return the result set to 'connect to/thread module'

3.2 The returned result can also be a status identifier, such as success or failure.

3.3 'Connect to the/thread module 'for subsequent cleanup, and continue to wait for a request or disconnect from the client

A small Summary

III. SQL parsing sequence

Next, let's take a look at the past and present of an SQL statement.

First, let's take a look at the example statement.

SELECT DISTINCT    < select_list >FROM    < left_table > < join_type >JOIN < right_table > ON < join_condition >WHERE    < where_condition >GROUP BY    < group_by_list >HAVING    < having_condition >ORDER BY    < order_by_condition >LIMIT < limit_number >

However, the execution sequence is as follows:

 1 FROM 
 
   2 ON 
  
    3 
   
     JOIN 
    
      4 WHERE 
     
       5 GROUP BY 
      
        6 HAVING 
       
         7 SELECT 8 DISTINCT 
        
          9 ORDER BY 
         
          10 LIMIT

Although I did not expect it to be like this, it is still quite natural and harmonious at first glance. Where can I get it? the filtering conditions should be the same or different, and the order should be sorted, the first few items are required.

Now, let's look at the details step by step.

Preparations

1. create a test database

Create database testQuery

2. create a test table

CREATE TABLE table1(    uid VARCHAR(10) NOT NULL,    name VARCHAR(10) NOT NULL,    PRIMARY KEY(uid))ENGINE=INNODB DEFAULT CHARSET=UTF8;CREATE TABLE table2(    oid INT NOT NULL auto_increment,    uid VARCHAR(10),    PRIMARY KEY(oid))ENGINE=INNODB DEFAULT CHARSET=UTF8;

3. Insert data

Insert into table1 (uid, name) VALUES ('AAA', 'Mike '), ('BBB', 'Jack'), ('CCC ', 'Mike '), ('ddd ', 'Mike ');

Insert into table2 (uid) VALUES ('aaa'), ('aaa'), ('BBB '), ('CCC '), (NULL );

4. the final result

SELECT    a.uid,    count(b.oid) AS totalFROM    table1 AS aLEFT JOIN table2 AS b ON a.uid = b.uidWHERE    a. NAME = 'mike'GROUP BY    a.uidHAVING    count(b.oid) < 2ORDER BY    total DESCLIMIT 1;

! Start SQL parsing now!

1. FROM

When multiple tables are involved, the output of the left table is used as the input of the right table, and a virtual table VT1 is generated.

(1-J1) Cartesian product

Calculates the Cartesian product (cross join) of two associated tables to generate a virtual table VT1-J1.

mysql> select * from table1,table2;+-----+------+-----+------+| uid | name | oid | uid  |+-----+------+-----+------+| aaa | mike |   1 | aaa  || bbb | jack |   1 | aaa  || ccc | mike |   1 | aaa  || ddd | mike |   1 | aaa  || aaa | mike |   2 | aaa  || bbb | jack |   2 | aaa  || ccc | mike |   2 | aaa  || ddd | mike |   2 | aaa  || aaa | mike |   3 | bbb  || bbb | jack |   3 | bbb  || ccc | mike |   3 | bbb  || ddd | mike |   3 | bbb  || aaa | mike |   4 | bbb  || bbb | jack |   4 | bbb  || ccc | mike |   4 | bbb  || ddd | mike |   4 | bbb  || aaa | mike |   5 | bbb  || bbb | jack |   5 | bbb  || ccc | mike |   5 | bbb  || ddd | mike |   5 | bbb  || aaa | mike |   6 | ccc  || bbb | jack |   6 | ccc  || ccc | mike |   6 | ccc  || ddd | mike |   6 | ccc  || aaa | mike |   7 | NULL || bbb | jack |   7 | NULL || ccc | mike |   7 | NULL || ddd | mike |   7 | NULL |+-----+------+-----+------+28 rows in set (0.00 sec)

(1-J2) ON filter

Filters based ON the virtual table VT1-J1, filters out all columns that meet the ON predicate conditions, and generates the virtual table VT1-J2.

Note: Because of syntax restrictions, 'Where' is used to replace the two, and the readers can also feel the subtle relationship between the two;

mysql> SELECT    -> *    -> FROM    -> table1,    -> table2    -> WHERE    -> table1.uid = table2.uid    -> ;+-----+------+-----+------+| uid | name | oid | uid  |+-----+------+-----+------+| aaa | mike |   1 | aaa  || aaa | mike |   2 | aaa  || bbb | jack |   3 | bbb  || bbb | jack |   4 | bbb  || bbb | jack |   5 | bbb  || ccc | mike |   6 | ccc  |+-----+------+-----+------+6 rows in set (0.00 sec)

(1-J3) Add external columns

If an external join (LEFT, RIGHT, FULL) is used, columns that do not meet the ON condition in the master table (reserved table) will also be added to the VT1-J2 as external rows, generate a virtual table VT1-J3.

mysql> SELECT    -> *    -> FROM    -> table1 AS a    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid;+-----+------+------+------+| uid | name | oid  | uid  |+-----+------+------+------+| aaa | mike |    1 | aaa  || aaa | mike |    2 | aaa  || bbb | jack |    3 | bbb  || bbb | jack |    4 | bbb  || bbb | jack |    5 | bbb  || ccc | mike |    6 | ccc  || ddd | mike | NULL | NULL |+-----+------+------+------+7 rows in set (0.00 sec)

The following figure shows an image of 'SQL jobs' on the internet. if your rights are violated, please let us know how to delete it. thank you.

2. WHERE

Filter the temporary tables generated during the VT1 process. columns that meet the WHERE clause are inserted into the VT2 table.

Note:

In this case, aggregation cannot be used because of grouping, and aliases created in SELECT cannot be used;

Differences from ON:

If an external column exists, ON filters associated tables, and the primary table (retained table) returns all columns;

If no external columns are added, the results are the same;

Application:

Filter the primary table in the WHERE clause;

For associated tables, if conditions are queried first and then connected, ON is used. if conditions are queried first, WHERE is used;

mysql> SELECT    -> *    -> FROM    -> table1 AS a    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid    -> WHERE    -> a. NAME = 'mike';+-----+------+------+------+| uid | name | oid  | uid  |+-----+------+------+------+| aaa | mike |    1 | aaa  || aaa | mike |    2 | aaa  || ccc | mike |    6 | ccc  || ddd | mike | NULL | NULL |+-----+------+------+------+4 rows in set (0.00 sec)

3. GROUP

This clause groups the tables generated in VT2 according to the columns in group. Generate a VT3 table.

Note:

The statements in the post-processing process, such as SELECT and HAVING, must contain columns in group by. aggregate functions must be used for those that do not exist;

Cause:

Group by changes the table reference and converts it to a new reference method. the columns that can perform the next-level logical operation on the table will be reduced;

My understanding is:

Based on the grouping field, records with the same grouping field are merged into one record, because only one record can be returned for each group, unless it is filtered out, fields that are not in the group field may have multiple values, and multiple values cannot be placed in one record. Therefore, you must use the aggregate function to convert these columns with multiple values into single values;

mysql> SELECT    -> *    -> FROM    -> table1 AS a    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid    -> WHERE    -> a. NAME = 'mike'    -> GROUP BY    -> a.uid;+-----+------+------+------+| uid | name | oid  | uid  |+-----+------+------+------+| aaa | mike |    1 | aaa  || ccc | mike |    6 | ccc  || ddd | mike | NULL | NULL |+-----+------+------+------+3 rows in set (0.00 sec)

4. HAVING

This clause filters different groups in the VT3 table and applies only to the grouped data. clauses that meet the HAVING condition are added to the VT4 table.

mysql> SELECT    -> *    -> FROM    -> table1 AS a    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid    -> WHERE    -> a. NAME = 'mike'    -> GROUP BY    -> a.uid    -> HAVING    -> count(b.oid) < 2;+-----+------+------+------+| uid | name | oid  | uid  |+-----+------+------+------+| ccc | mike |    6 | ccc  || ddd | mike | NULL | NULL |+-----+------+------+------+2 rows in set (0.00 sec)

5. SELECT

This clause processes the elements in the SELECT clause and generates a VT5 table.

(5-J1) calculate the expression in the SELECT clause and generate the VT5-J1

(5-J2) DISTINCT

Looks for duplicate columns in the VT5-1 and removes them to generate the VT5-J2

If the DISTINCT clause is specified in the query, a temporary memory table is created (if the memory cannot be stored, it needs to be stored on the hard disk ). The table structure of this temporary table is the same as that of the virtual table VT5 generated in the previous step. The difference is that a unique index is added to the column for the DISTINCT operation to remove duplicate data.

mysql> SELECT    -> a.uid,    -> count(b.oid) AS total    -> FROM    -> table1 AS a    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid    -> WHERE    -> a. NAME = 'mike'    -> GROUP BY    -> a.uid    -> HAVING    -> count(b.oid) < 2;+-----+-------+| uid | total |+-----+-------+| ccc |     1 || ddd |     0 |+-----+-------+2 rows in set (0.00 sec)

6. ORDER

From a table in the VT5-J2, sort the results according to the conditions of the order by clause to generate a VT6 table.

Note:

The only place where the alias in SELECT can be used;

mysql> SELECT    -> a.uid,    -> count(b.oid) AS total    -> FROM    -> table1 AS a    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid    -> WHERE    -> a. NAME = 'mike'    -> GROUP BY    -> a.uid    -> HAVING    -> count(b.oid) < 2    -> ORDER BY    -> total DESC;+-----+-------+| uid | total |+-----+-------+| ccc |     1 || ddd |     0 |+-----+-------+2 rows in set (0.00 sec)

7. LIMIT

The LIMIT clause selects the specified row data starting from the specified position from the VT6 virtual table obtained in the previous step.

Note:

The positive and negative effects of offset and rows;

When the offset is large, the efficiency is very low. you can do this:

The subquery is optimized. in the subquery, the maximum id is obtained from the index, and the result set of N rows is obtained in reverse order.

Using inner join optimization, the JOIN clause first obtains the ID list from the index and then directly associates the query to obtain the final result.

mysql> SELECT    -> a.uid,    -> count(b.oid) AS total    -> FROM    -> table1 AS a    -> LEFT JOIN table2 AS b ON a.uid = b.uid    -> WHERE    -> a. NAME = 'mike'    -> GROUP BY    -> a.uid    -> HAVING    -> count(b.oid) < 2    -> ORDER BY    -> total DESC    -> LIMIT 1;+-----+-------+| uid | total |+-----+-------+| ccc |     1 |+-----+-------+1 row in set (0.00 sec)

So far, the SQL parsing journey is over. to sum up:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

SQL parsing sequence _ MySQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

SQL parsing sequence _ MySQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support