SQL query Operations Processing flow

Source: Internet
Author: User
Tags joins

here to see the book on the SQL query processing, in SQL query is through the SELECT statement and some auxiliary clauses to achieve the required output, here is the use of MySQL, first of all, to understand the difference between physical and virtual tables, physical table is stored on the file system of one or more files, According to the corresponding storage data structure, each row of data is stored, the virtual table is we build on the basis of the physical table, it may be all the table structure, it may be part of the table or part of the row, or even some of the table field after some kind of operation results. But the SQL language is not like c/c++/java such language, their main difference is that SQL gives you a need, do not care how it is implemented, the real implementation is done by the database, of course, different databases may have a large difference in the implementation of the language, and C + + such languages, We have to implement the specified function in a step-by-step implementation. So we don't have to worry about the internal implementation of SQL, but for query requests, its execution and execution order have a big impact on our understanding of the results, but the real execution process needs to go deep inside the database to understand which methods (such as index) optimization are used, So here we focus on the order of execution of each clause in the query request. The results of the standard SQL query statements are as follows: (8) SELECT x, xx, xxx, xxxx \ (9) DISTINCT <x, xx,... > \ (1) from <tab Le1, Table2, table3,... > \ (3) <join_type>JOIN<join_table> \ (2) on <join_conditio N> \ (4) WHERE <where_condition> \ (5) GROUP by <group_by_list> \ (6) with[ CUBE | ROLLUP] \ (7) having The order in which each clause is executed is already marked before, and before we introduce these clauses, we need to understand that the input to execute this command is these table names, conditions (on, where and having), and some field names (Select,distinct, ORDER by, GROUP by), where the table name identifies the physically stored table, and the field name is the name of the field for each table, conditional on the logical expression of the aggregation function that is performed on the field or on that field. Each step of these clauses produces a logical relational table. 1), from operation, this step of the input is a number of tables, in this step is to perform Cartesian product operations on multiple relational tables, the resulting result is VT1, assuming there are 3 tables, T1, T2 and T3, three tables have R1, R2, R3 rows, each table has C1, C2, C3 fields, Then the VT1 that executes the Cartesian product have r1*r2*r3 rows and c1+c2+c3 fields. 2), this step is performed on the basis of the first step of the VT generated by the judgment operation, on the following logical expression is a VT1 table on each row of the judgment, we will eliminate the rows that do not meet the on condition, but in general, we do the logical operation returned by the result is always true or false, But in SQL, there may be a third value, that is null, this is generally considered null (note, not an empty string), it is neither true nor false, let's see:mysql> select null = NULL \g***************** 1. Row ***************************null = Null:null
Mysql> Select 0 = NULL \g*************************** 1. Row ***************************0 = Null:null
Mysql> Select 1 = NULL \g*************************** 1. Row ***************************1 = Null:null
Mysql> SELECT null! = NULL \g*************************** 1. Row ***************************null! = Null:null

So, we can think of NULL as an unknown state, so two unknown states are not equal. But when the condition is judged, we are going to tell true or false, because we are going to judge whether this line is eliminated by this result, and in the ON clause, NULL is considered false, so if the field of a row is null, it will be eliminated. For example:


Next, let's take a look at the result of executing the on operation after executing the Cartesian product of this table:


As you can see, the VT1 table generated after the Cartesian product should contain 9 rows, where the t2.b should contain ' hello ', ' world ' and ' null three ', where only ' world ' is returned, so when the null! = ' Hello ' is returned as NULL,     But the ON clause considers it false. So after the operation of the ON clause, we get table 2, which contains only the rows that satisfy the logical operation after on to get true (elimination of the rows that evaluates to False or null). 3), this step is to add the operation of the external row, this is for the external join, because the external join needs to ensure that the left side of the table or the right side of the table each row appears in the VT2, but if the on operation of this line is eliminated? Then add, just based on the left JOIN or right connection to add to the left side or the table on the right side of the missing row, and then the other fields to add null can be, get the VT3.

As can be seen from the above example, after the Cartesian product calculation, a total of 9 rows, and then perform the on operation, only 3 rows to meet the condition, and the condition is t2.b!= ' hello ', which resulted in the T2 table only the b= ' hello ' of the line, as shown in the previous figure, However, if you do a right outer JOIN operation, you need to add the T2 table (because T2 is the table on the right side of the join operation) and every row that is missing (two rows are added here), and for those two rows except the T2 field, add null. If you change the T1 and T2 in the example above, then the right join is replaced with a LEFT join, you can get a similar result, except that the added null is now the left side. 4), this step is the most familiar WHERE clause, this step is to perform conditional filtering on the basis of VT3, similarly, only the rows that the condition evaluates to True are left, and the rows that result in false and Null are eliminated. Marks the returned result as VT4.

However, there are some problems to be aware of when the WHERE clause is executed: 1, because where is the fourth step, and the GROUP BY clause is not executed at this time, it is not possible to use the result of the aggregate function operation in the WHERE clause as input to the conditional operation, which will result in an execution error. Mysql> SELECT * from T1 as T1 right joins T1 as t2 on t2.b! = ' Hello ' where count (t1.b)! = 1;
ERROR 1111 (HY000): Invalid use of group function 2, the as alias that appears in the select operation cannot be used in the WHERE clause because there is no execution to the select operation when the where is executed.     Mysql> Select T1.a as a from T1 as T1 right joins T1 as t2 on t2.b! = ' Hello ' A! = 10; Error 1064 (42000): You have a error in your SQL syntax;     Check the manual-corresponds to your MySQL server version for the right syntax-use near ' A! = ' at line 1 3. To make it clear that the where is executed after adding the missing rows, and on execution before adding the missing rows, the result may not be the same if the same logical calculation is used on and where. 5), this step is a grouping of operations, on the basis of VT4, based on the group by the column determined by grouping, in the GROUP BY clause will be considered all null is the same, instead of NULL! = NULL, they are divided into a group, for example:

6), with clauses are divided into rollup and cube Two, these two we basically do not, cube in MySQL also does not support, rollup just in the results to add an additional line.

This does not address the role of rollup, assuming that it produces a result of VT67), the execution of a HAVING clause, which is also the function of each group after aggregation (in the group by the column specified by the same behavior of a set), which is distinguished with the on and WHERE clauses, They are all judged on each row in the result of the previous operation, and the having is for each group. This can be a judgment of one or more fields, or it can be the judgment of a aggregation function in each group.     The result of the build is VT7. However, it is important to note that while a group by IS dividing the execution column to the same set of NULL, the count operation of the execution is considered to be 0, for example:

However, when you calculate the count (*), a row, even if it is null, is counted as one line, for example:
This requires particular attention in specific applications, especially when using the outer join, which adds some null, possibly due to an incorrect count by using count (*) instead of count (field name). 8), this step is performed by the select operation, then select appears at the top of the query statement, but it is the eighth step to be executed, this step is to select the column on VT7. Generate VT8. In addition, the alias of the preceding field cannot be used in select, which means that the alias of select is valid only after the entire SELECT clause has been executed.

This requires particular attention in specific applications, especially when using the outer join, which adds some null, possibly due to an incorrect count by using count (*) instead of count (field name). 8), this step is performed by the select operation, then select appears at the top of the query statement, but it is the eighth step to be executed, this step is to select the column on VT7. Generate VT8. In addition, the alias of the preceding field cannot be used in select, which means that the alias of select is valid only after the entire SELECT clause has been executed.

11), the last step is to perform the offset ... The limit clause, which is the VT10 table generated above, selects the specified number of rows from the specified offset, and offset can be omitted, written as limit m, N. It has the same meaning as limit n OFFSET m.
Well, the general SQL query statements are executed in this order, of course, the specific logical query and physical query scheme here we are not involved, just to introduce each step of the execution sequence and its role, of course, it also involves some rules in MySQL, the future use of SQL query statements before the first to correct this order , and then consider what query statements to use to get the desired results.

SQL query Operations Processing flow

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.