Detailed summary of the keyword parsing process of SQL statements

Source: Internet
Author: User

Recently, we need to do some research on SQL query performance improvement, so we have studied the SQL statement solution process. I read it in the garden and wrote many related articles, with different focuses. This article is summarized on the mobile phone after reading various materials. It will detail the parsing process of each keyword of an SQL statement step by step. You are welcome to learn from each other.

SQL statement parsing Sequence

Simply put, an SQL statement is parsed in the following order:

1. FROMThe table following the FROM statement identifies the data source to be queried. And some clauses such as (1-J1) Cartesian Product, (1-J2) ON filter, (1-J3) add external columns, object to be applied. A virtual table VT1 is generated after the FROM process.
(1-J1) Cartesian ProductThis step calculates the Cartesian Product (cross join) of two associated tables and generates the virtual table VT1-J1.
(1-J2) ON FilterThis step filters based ON the virtual table VT1-J1, which generates a virtual table VT1-J2 by filtering out all columns that meet the ON predicates.
(1-J3) add external rowsIf an external join is used, columns that do not meet the ON condition in the table are retained to the VT1-J2 as external rows to generate a virtual table VT1-J3.
2. WHEREFilter the temporary tables generated during the VT1 process. columns that meet the where clause are inserted into the VT2 table.
3. GROUPThis clause groups the tables generated in VT2 according to the columns in group. Generate a VT3 table.
4. HAVINGThis clause filters different groups in the VT3 table. clauses that meet the HAVING condition are added to the VT4 table.
5. SELECTThis clause processes the elements in the SELECT clause and generates a VT5 table.
(5-1) Calculation ExpressionCalculate the expression in the SELECT clause to generate the VT5-1
(5-2)DISTINCT looks for duplicate columns in the VT5-1 and removes them to generate the VT5-2
(5-3)TOP: filter the columns that meet the conditions from the results defined BY the order by clause. Generate a VT5-3 table
ORDERFrom the table in the VT5-3, sort the results according to the conditions of the order by clause to generate the VC6 table.

Customer and order query example

First, create a mers MERs table and insert the following data:

Customerid City
FISSA Madrid
FRNDO Madrid
KRLOS Madrid
MRPHS Zion

Create an Orders table and insert the following data:

Orderid Customerid
1 FRNDO
2 FRNDO
3 KRLOS
4 KRLOS
5 KRLOS
6 MRPHS
7 NULL
 

If we want to query customers from Madrid who have less than 3 orders and display their orders, the results will be sorted in ascending order.
Copy codeThe Code is as follows:
Select c. customerid, COUNT (O. orderid) AS numorders
FROM dbo. Customers AS C
Left outer join dbo. Orders AS O
On c. customerid = O. customerid
Where c. city = 'Madrid'
Group by c. customerid
Having count (O. orderid) <3
Order by numorders

The query result is:

Customerid Numorders
FISSA 0
FRNDO 2

The following describes in detail how SQL calculates the result:

FROM clause

The FROM clause identifies the table to be queried. If a table operation is specified, it is processed FROM left to right. Each table operation based on one or two tables returns an output table. The output result of the Left table is used as the input result for the next table operation. For example, table submission operations include (1-J1) Cartesian Product, (1-J2) ON filter, and (1-J3) add external columns. FROM sentence Generation Virtual table VT1.

Step 1-J1: Execute Cartesian Product (cross join)

Cartesian product lists all possible combinations in each row of the left and right tables to generate a table VT1-J1. If the left table has m columns and the right table has n columns, then the VT1-J1 table generated after Descartes will have m × n columns.

Step 1-J1 is equivalent to executing:

SELECT * from Customers C CROSS JOIN Orders O

The execution result is: (a total of 4 × 7 columns)

C. customerid C. city O. orderid O. customerid
FISSA Madrid 1 FRNDO
FISSA Madrid 2 FRNDO
FISSA Madrid 3 KRLOS
FISSA Madrid 4 KRLOS
FISSA Madrid 5 KRLOS
FISSA Madrid 6 MRPHS
FISSA Madrid 7 NULL
FRNDO Madrid 1 FRNDO
FRNDO Madrid 2 FRNDO
FRNDO Madrid 3 KRLOS
FRNDO Madrid 4 KRLOS
FRNDO Madrid 5 KRLOS
FRNDO Madrid 6 MRPHS
FRNDO Madrid 7 NULL
KRLOS Madrid 1 FRNDO
KRLOS Madrid 2 FRNDO
KRLOS Madrid 3 KRLOS
KRLOS Madrid 4 KRLOS
KRLOS Madrid 5 KRLOS
KRLOS Madrid 6 MRPHS
KRLOS Madrid 7 NULL
MRPHS Zion 1 FRNDO
MRPHS Zion 2 FRNDO
MRPHS Zion 3 KRLOS
MRPHS Zion 4 KRLOS
MRPHS Zion 5 KRLOS
MRPHS Zion 6 MRPHS
MRPHS Zion 7 NULL
Step 1-J2: Apply ON filtering (JOIN condition)

The ON filter condition is the most advanced among the three SQL filter conditions (ON, WHERE, HAVING), and The ON filter condition is applied to the virtual table (VT1-J1) generated in the previous step ), rows that meet the ON filter condition are added to the virtual table VT1-J2. After the ON filter is applied, the generated VT1-J2 table is as follows:

C. customerid C. city O. orderid O. customerid
FRNDO Madrid 1 FRNDO
FRNDO Madrid 2 FRNDO
KRLOS Madrid 3 KRLOS
KRLOS Madrid 4 KRLOS
KRLOS Madrid 5 KRLOS
MRPHS Zion 6 MRPHS
Step 1-J3: Add external columns

This step only appears when an external connection is used. For Outer Join (LEFT, RIGHT, or FULL), you can mark one or two tables as reserved tables. As a reserved table, you want all columns in the table to be returned, even if the data in the table does not meet the filtering conditions of the on clause. Left outer join marks the table on the LEFT as a reserved table, rightouter join regards the table on the right as a reserved table, and full outer join marks both tables as reserved tables. step 1-J3 adds a column that does not meet the ON condition in the keep table based ON the virtual table in the VT1-J2 and does not have a corresponding column in The unretained table, so it is marked as NULL. This process generates a virtual table VT1-J3.

C. customerid C. city O. orderid O. customerid
FISSA Madrid NULL NULL
FRNDO Madrid 1 FRNDO
FRNDO Madrid 2 FRNDO
KRLOS Madrid 3 KRLOS
KRLOS Madrid 4 KRLOS
KRLOS Madrid 5 KRLOS
MRPHS Zion 6 MRPHS

If the FROM clause contains multiple table operation operations, the SQL statement is processed FROM left to right, and the temporary table result generated on the left is used as the input table of the right table.

Step 2 WHERE clause

The WHERE filter is applied to the temporary table generated in the previous step. A temporary table VT2 is generated based on the WHERE filter condition.

Note: Since the data has not yet been grouped, you cannot use aggregate operations now-for example, you cannot use such a sentence WHERE orderdate = MAX (orderdate ). In addition, you cannot use the variable alias created in the SELECT clause, because the SELECT clause has not been processed yet-for example, you cannot write such a sentence: select year (orderdate) AS orderyear... WHERE orderyear> 2008.

Apply this filter

WHERE C.city = 'Madrid'

The temporary table VT2 is generated as follows:

C. customerid C. city O. orderid O. customerid
FISSA Madrid NULL NULL
FRNDO Madrid 1 FRNDO
FRNDO Madrid 2 FRNDO
KRLOS Madrid 3 KRLOS
KRLOS Madrid 4 KRLOS
KRLOS Madrid 5 KRLOS

In this example, you need to use on c. customerid = O. customerid filtering. Customers without orders are filtered out in step 1-J2, but in step 1-J3, they are added as external columns. However, since you only want to return customers from Madrid, you need to filter the city (where c. city = 'Madrid '). If you put it in the ON filter, customers that do not belong to Madrid will be added back in the Add external column.

The difference between ON and WHERE must be explained here. The main difference between ON and WHERE is that ON is filtered before external columns are added, and WHERE is followed. Columns filtered out by ON are added in 1-j3. If you do not need to add external columns, the two filters are the same.

Step 3 group by clause

This clause groups the data in the temporary table generated in the previous step. Each row is grouped into only one group to generate the virtual table VT3. The VT3 table contains all the data in the VT2 table and the Group Identifier.

The contents of the temporary table VT3 generated here are as follows:

Groups
C. customerid
C. customerid C. city O. orderid O. customerid
FISSA FISSA Madrid NULL NULL
FRNDO FRNDO Madrid 1 FRNDO
FRNDO Madrid 2 FRNDO
  KRLOS Madrid 3 KRLOS
KRLOS KRLOS Madrid 4 KRLOS
  KRLOS Madrid 5 KRLOS

In the final result returned BY SQL, each GROUP must return only one row (unless filtered out). Therefore, when group by is used in an SQL statement, the clause processed after GROUP BY is used, for example, SELECT and HAVING clauses can only use columns after group by. For columns without group by, Aggregate functions (such as MAX, MIN, COUNT, AVG) to ensure that each GROUP returns only one row.

Step 4 HAVING clause

The HAVING clause is used to filter the temporary tables generated in the previous step and only apply to the grouped data. Groups that meet the HAVING condition are added to the virtual table VT4.

When this filter is applied:
Copy codeThe Code is as follows:
Having count (O. orderid) <3

The generated VT4 table is as follows:

Groups
C. customerid
C. customerid C. city O. orderid O. customerid
FISSA FISSA Madrid NULL NULL
FRNDO FRNDO Madrid 1 FRNDO
  FRNDO Madrid 2 FRNDO

Note that COUNT (O. orderid) instead of COUNT (*). Because an external column is added to this query, the COUNT method ignores NULL columns, resulting in unwanted results.

Step 5 SELECT clause

Even if it appears at the beginning of the SQL statement, the SELECT statement is processed in step 5, and the table returned by the SELECT clause is finally returned to the caller. This clause contains three substages: (5-1) calculation expression, (5-2) Processing DISTINCT, (5-3) Applying TOP filter.

Step 5-1 Calculation Expression

The expression in the select clause can return or operate the basic columns returned from the table in the previous step. If this SQL statement is an aggregate query, after Step 3, you can only use columns in GROUP BY, and must use aggregate operations for columns not in the GROUP set. An alias for a basic column that does not belong to a FROM table, such as year (orderdate) AS orderyear.

Note:Aliases created in the SELECT clause cannot be used in the previous Step, even in the SELECT clause. The reason is that many SQL operations are performed simultaneously (all at once operation). As for what is all-at-once operation, we will not introduce it here. Therefore, aliases created in the SELECT clause can only be used in subsequent clauses, such as order. For example, select year (orderdate) AS orderyear... order by orderyear.

In this example:
Copy codeThe Code is as follows:
Select c. customerid, COUNT (O. orderid) AS numorders

The result is a virtual table VT5-1:

C. customerid Numorders
FIFSSA 0
FRNDO 2

Step 5-2: Apply the DISTINCT clause

If DISTINCT is used in SQL statements, SQL removes duplicate columns to generate a virtual table VT5-2.

Step 5-3: Application TOP options

The TOP option is a feature provided by the T-SQL to indicate how many rows are displayed. Based on the ORDER defined BY the order by clause, the specified number of columns will be queried. This process generates a virtual table VT5-3.

As mentioned above, this step depends on the ORDER defined by order by to determine which columns should be displayed in the front. If you do not specify the order by order of results or use the with ties clause, the returned results may be inconsistent.

In our example, Step 5-3 is omitted because the TOP keyword is not used.

Step 6: order by clause

The virtual table returned in the previous step is ordered in this step. The cursor VC6 is returned based on the ORDER specified BY the order by clause. The order by clause is also the only place where aliases can be created using the SELECT clause.

Note:The difference between this step and the previous step is that the result returned by this step is a cursor, not a table. SQL is based on the set theory. A set does not define the sequence of its rows. It is only a logical set of members, so the sequence of members is not important. An SQL statement with an ORDER BY clause returns an object that organizes each row in a specific sequence. ANSI calls such an object a cursor. Understanding this is very important for you to understand SQL.

The above steps:

The main content of this book is reference Inside Microsoft SQL Server 2008: T-SQL Query, in the content, if you want to have a deep understanding of SQL Query related knowledge, you can look for this book to see, I have a pdf of the original English version. If you need it, contact me.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.