SQL query operation process and SQL operation process

Source: Internet
Author: User

SQL query operation process and SQL operation process
Here we can see the Query Processing of SQL in the book. In SQL, the query is achieved through the select statement and some auxiliary clauses. Here mysql is used. First, to understand the difference between a physical table and a virtual table, a physical table is one or more files stored in a file system. Each row of data is stored according to the corresponding storage data structure, A virtual table is built on the basis of a physical table. It may be a full table structure, or some fields or rows in the table, it may even be the result of some fields in the table after some calculation. However, unlike C/C ++/JAVA, the main difference between SQL is that SQL gives you a requirement and does not care how it is implemented. The real implementation is done by the database, of course, different databases may have very different implementation methods. For languages such as C/C ++, we must implement them step by step to complete the specified functions. Therefore, we do not need to care about the internal implementation process of SQL, but the execution process and sequence of the query request have a great impact on our understanding of the results, however, the real execution process must be in-depth inside the database to understand the methods used (such as indexes) for optimization. So here we will focus on the execution sequence of each clause in the query request. The results of the standard SQL query statement are as follows: (8) SELECT x, xx, xxx, xxxx \ (9) DISTINCT <x, xx,...> \ (1) FROM <table1, table2, table3,...> \ (3) <join_type> JOIN <join_table> \ (2) ON <join_condition> \ (4) WHERE <where_condition> \ (5) group by <group_by_list> \ (6) WITH [CUBE | ROLLUP] \ (7) HAVING The execution order of each clause has been marked above. Before we introduce these clauses one by one, we need to clarify first, the table names, conditions (ON, WHERE, and HAVING), and field names (SELECT, DISTINCT, order by, and group by) are input for executing this command ), the table name identifies a physically stored table, and the field name is the name of each table field. The condition is the logical expression that is applicable to the field or the clustering function executed on the field. Each step of these clauses generates a logical relational table. 1) FROM operation. The input in this step is multiple tables. In this step, perform the Cartesian Product Operation on multiple Relational Tables. The generated result is VT1, suppose there are three tables: t1, t2, and t3. The three tables have rows r1, r2, and r3 respectively. Each table has c1, c2, and c3 fields, then, the VT1 obtained by executing the Cartesian product contains r1 * r2 * r3 and c1 + c2 + c3 fields. 2) this step is performed based ON the VT generated in the first step. The logical expression after ON is a judgment for each row ON the VT1 table, we will remove rows that do not meet the ON condition, but in general, the result returned by the logical operation is always TRUE or FALSE. However, there may be a third value in SQL, that is NULL. This is generally considered NULL (note, it is not a NULL String). It is neither TRUE nor FALSE. Let's look at it: mysql> select NULL = NULL \ G **************************** 1. row ************************** NULL = NULL: NULL
Mysql> select 0 = NULL \ G **************************** 1. row ************************** 0 = NULL: NULL
Mysql> select 1 = NULL \ G **************************** 1. row *************************** 1 = NULL: NULL
Mysql> select NULL! = NULL \ G *************************** 1. row *************************** NULL! = NULL: NULL

Therefore, we can regard NULL as an unknown state, so the two unknown states are not equal. However, when determining the condition, we need to separate TRUE or FALSE, because we need to determine whether to eliminate this line based ON this result. In the ON clause, NULL is regarded as FALSE, therefore, if the field of a row is NULL, it will be eliminated. For example:


Next, let's take a look at the result of executing the ON operation after executing the Cartesian product of this table:


We can see that the VT1 table generated after the Cartesian product is executed should contain nine rows, of which t2. B should contain three types: 'hello', 'World, and 'null'. Here, only 'World' is returned ', so NULL is being determined! = 'Hello', although NULL is returned, the ON Clause treats it as FALSE. Therefore, after the operation of the ON clause, table 2 is obtained, which only contains rows that meet the logic operation after the ON clause to obtain TRUE (the calculation result is FALSE or NULL ). 3) In this step, the operation to add external rows is performed. Here, external JOIN operations are performed, because external JOIN requires that each row in the table ON the left or right of the table to be connected appear in VT2. But what if this row is eliminated during the ON operation? Then, add the rows in the upper left or right table based on the left or right connections. Then, add NULL for other fields to obtain VT3.

As shown in the preceding example, after Cartesian Product calculation, a total of nine rows are generated, and then the ON operation is executed. Only three rows meet the conditions, and the condition is t2. B! = 'Hello', which causes only the Row B = 'hello' left in Table t2, as shown in the preceding figure. However, if the right outer join operation is performed, then you need to add each row missing from Table t2 (because table t2 is the table on the right of the Join Operation) (two rows need to be added here ), for all fields except the t2 field in the two rows, add NULL. If you change t1 and t2 in the above example, and then change right join to left join, you can get a similar result, but the added NULL appears on the right. 4) this step is the WHERE clause we are most familiar with. This step is to implement conditional Filtering Based on VT3. Similarly, only rows with TRUE calculated by the condition are retained, the rows whose result is FALSE and NULL are eliminated. Mark the returned result as VT4.

However, when executing the WHERE clause, you must pay attention to the following issues: 1. Because the WHERE clause is executed in Step 4, the group by clause is not executed at this time, therefore, you cannot use the result after the aggregate function operation in the WHERE clause as the input of the conditional operation. This will cause execution errors. Mysql> select * from T1 as t1 right join T1 as t2 on t2. B! = 'Hello' where count (t1. B )! = 1;
ERROR 1111 (HY000): Invalid use of group function 2. Because the SELECT operation is not executed when the WHERE operation is executed, therefore, the as alias in the SELECT operation cannot be used in the WHERE clause. Mysql> select t1.a as A from T1 as t1 right join T1 as t2 on t2. B! = 'Hello'! = 10; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'a! = 10' at line 1 3. make it clear that the WHERE statement is executed after the missing row is added, and the ON statement is executed before the missing row is added, therefore, if the same logic is used in ON and WHERE, the results may not be the same. 5) in this step, grouping is performed. Based on VT4, grouping is performed based on the columns determined by group by. In the group by clause, all NULL values are considered the same, instead of NULL! = NULL, which is divided into one group, for example:

6) The with clause can be divided into two types: ROLLUP and CUBE. We basically cannot use these two types, and CUBE is not supported in mysql. ROLLUP only adds an additional row to the result.

The role of ROLLUP is not discussed here. Assume that the result is VT67.) execute the HAVING clause, this clause is used to determine each GROUP after aggregation (the same behavior GROUP in the column specified by group by). This is different from the ON and WHERE clauses, both determine each row in the result of the previous operation, while HAVING is for each group. This can be the judgment of one or more fields, or the judgment of the aggregate function executed in each group. The generated result is VT7. However, when executing HAVING, you must note that although the execution column is NULL in group by, the execution column is divided into the same GROUP, but when a column is NULL, the count operation of this column is considered as 0, for example:

However, when calculating count (*), a row is counted as one even if it is NULL. For example:
This requires special attention in specific applications, especially when using outer join, some NULL values are added, probably because count (*) is used instead of count (field name) result In incorrect count. 8) This step executes the SELECT Operation, and then the SELECT statement appears at the beginning of the query statement, but it is executed only in Step 8, this step selects the columns specified by SELECT on VT7. Generate VT8. In addition, the alias of the preceding field cannot be used in SELECT, that is, the alias of SELECT is valid only after the entire SELECT clause is executed.

This requires special attention in specific applications, especially when using outer join, some NULL values are added, probably because count (*) is used instead of count (field name) result In incorrect count. 8) This step executes the SELECT Operation, and then the SELECT statement appears at the beginning of the query statement, but it is executed only in Step 8, this step selects the columns specified by SELECT on VT7. Generate VT8. In addition, the alias of the preceding field cannot be used in SELECT, that is, the alias of SELECT is valid only after the entire SELECT clause is executed.

11). The last step is OFFSET... in the VT10 table generated above, select the specified number of rows starting from the specified OFFSET. The OFFSET can also be omitted and written as LIMIT m, n. it has the same meaning as LIMIT n OFFSET m.
Well, the general SQL query statements are executed in this order. Of course, the specific logic query and physical query solutions are not involved here. It only introduces the execution sequence of each step and its role, of course, some mysql rules are also involved. Before using SQL query statements, you must first sort the order and then consider the query statements to get the expected results.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.