Detailed summary of parsing process for each keyword of SQL statement _mssql

Source: Internet
Author: User
Tags aliases microsoft sql server one table reserved

Because of the recent need to do some research on the performance improvement of SQL query, I studied the solution of SQL statements. Read in the garden, we wrote a lot of related articles, we have different emphasis. This article is I read a variety of data after the phone summary, will be detailed, step-by-step about a SQL statement of each keyword parsing process, welcome to learn from each other.

parsing Order of SQL statements

Simply put, an SQL statement is parsed in the following order:

1. fromThe table following the from identifies the data source to query for this statement. And some clauses such as, (1-J1) Cartesian product, (1-J2) on filter, (1-J3) Add the outer column, the object to be applied. A virtual table VT1 is generated after the from procedure.
(1-j1) Cartesian productThis step calculates the Cartesian product (CROSS JOIN) of the two associated tables and generates the virtual table vt1-j1.
(1-J2) on filterThis step filters out all the columns that satisfy the on predicate condition based on the virtual table vt1-j1 this virtual table, and generates the virtual table vt1-j2.
(1-J3) adding external rowsIf an outer join is used, a column in the reserved table that does not conform to the on condition is added to the vt1-j2 as the outer row, generating the virtual table vt1-j3.
2. WHEREFilters the temporary tables generated during the VT1 process, and the columns that satisfy the WHERE clause are inserted into the VT2 table.
3. GROUP byThis clause groups the tables generated in the VT2 according to the columns in group by. Generates VT3 tables.
4. havingThis clause filters the different groups in the VT3 table, and clauses that satisfy the having condition are added to the VT4 table.
5. SELECTThis clause processes the elements in the SELECT clause to generate a VT5 table.
(5-1) Calculation of expressionEvaluates an expression in the SELECT clause to generate VT5-1
(5-2)DISTINCT Look for duplicate columns in the vt5-1 and delete them, generating vt5-2
(5-3)Top filters out the eligible columns from the results defined by the ORDER BY clause. Generate Vt5-3 Table
ORDER BYFrom the table in vt5-3, the results are sorted according to the conditions of the ORDER BY clause, and the VC6 table is generated.

customer, order query example

First create a Customers table and insert the following data:

CustomerID City
Fissa Madrid
Frndo Madrid
Krlos Madrid
Mrphs Zion

Create an Orders table and insert the following data:

OrderID CustomerID
1 Frndo
2 Frndo
3 Krlos
4 Krlos
5 Krlos
6 Mrphs
7 Null

If we want to query customers from Madrid with orders that are less than 3, and display their order numbers, the results are sorted from small to large orders.

Copy Code code as follows:

SELECT C.customerid, COUNT (O.orderid) as Numorders
FROM dbo. Customers as C
Left OUTER JOIN dbo. Orders as O
On c.customerid = O.customerid
WHERE c.city = ' Madrid '
GROUP by C.customerid
Having COUNT (O.orderid) < 3
ORDER BY Numorders

The results of the query are:

CustomerID Numorders
Fissa 0
Frndo 2

Here's a detailed account of how SQL calculates this result:

From clause

The FROM clause identifies the table that needs to be queried, and if a table operation is specified, it is handled from left to right, and each table operation based on one or two tables returns an output table. The output from the table on the left will be the result of the next table operation. For example, the operations associated with the table (1-J1) Cartesian product, (1-J2) on the filter, (1-J3) Add the outer column. The from sentence generates a virtual table VT1.

Step 1-j1: Perform Cartesian product (CROSS JOIN)

The Cartesian product will list all possible combinations of the left and right two tables to generate the table vt1-j1, if the left table has m columns, and the right-hand table has n columns, then the VT1-J1 table generated after the Cartesian product will have the MXN column.

Step 1-j1 This procedure is equivalent to executing :

SELECT * from Customers C CROSS JOIN Orders O

The results are as follows: (a total of 4x7 columns)

C.customerid C.city O.orderid O.customerid
Fissa Madrid 1 Frndo
Fissa Madrid 2 Frndo
Fissa Madrid 3 Krlos
Fissa Madrid 4 Krlos
Fissa Madrid 5 Krlos
Fissa Madrid 6 Mrphs
Fissa Madrid 7 Null
Frndo Madrid 1 Frndo
Frndo Madrid 2 Frndo
Frndo Madrid 3 Krlos
Frndo Madrid 4 Krlos
Frndo Madrid 5 Krlos
Frndo Madrid 6 Mrphs
Frndo Madrid 7 Null
Krlos Madrid 1 Frndo
Krlos Madrid 2 Frndo
Krlos Madrid 3 Krlos
Krlos Madrid 4 Krlos
Krlos Madrid 5 Krlos
Krlos Madrid 6 Mrphs
Krlos Madrid 7 Null
Mrphs Zion 1 Frndo
Mrphs Zion 2 Frndo
Mrphs Zion 3 Krlos
Mrphs Zion 4 Krlos
Mrphs Zion 5 Krlos
Mrphs Zion 6 Mrphs
Mrphs Zion 7 Null
Step 1-j2: Apply on filter (JOIN condition)

The on filter condition is first performed in three filter conditions (on,where,having) of SQL, the on filter condition is applied to the virtual table (VT1-J1) that was generated in the previous step, and rows satisfying on filter conditions are added to the virtual table vt1-j2. After the on filter is applied, the resulting VT1-J2 table looks like this:

C.customerid C.city O.orderid O.customerid
Frndo Madrid 1 Frndo
Frndo Madrid 2 Frndo
Krlos Madrid 3 Krlos
Krlos Madrid 4 Krlos
Krlos Madrid 5 Krlos
Mrphs Zion 6 Mrphs
Step 1-j3: adding external columns

This step only occurs when an outer join is used. For outer joins (Left,right, or full), you can mark one or two tables as reserved tables. As a reserved table means you want all columns in the table to be returned, even if the data inside it does not satisfy the filter on clause. The left OUTER join marks the table on the left-hand side as a reserved table, Rightouter join takes the table on the right as a reserved table, and the full OUTER join marks all two tables as reserved tables. Step 1-J3 is not satisfied on in the reserved table according to the virtual table in Vt1-j2 The column of the condition, which does not have a corresponding column in the unreserved table, is marked as null. This process generates a virtual table vt1-j3.

C.customerid C.city O.orderid O.customerid
Fissa Madrid Null Null
Frndo Madrid 1 Frndo
Frndo Madrid 2 Frndo
Krlos Madrid 3 Krlos
Krlos Madrid 4 Krlos
Krlos Madrid 5 Krlos
Mrphs Zion 6 Mrphs

If there is more than one table operation in the FROM clause, SQL is processed in left-to-right order, and the temporary table result on the left is the input table for the right table.

Step 2 WHERE clause

The where filter is applied to the temporary table generated in the previous step, and the temporary table VT2 is generated based on the where filter condition.

Note: Since the data has not been grouped now, you cannot use aggregation operations-for example: You cannot use such a sentence WHERE OrderDate = MAX (OrderDate). Alternatively, you cannot use the variable alias created in the SELECT clause, because the SELECT clause is not currently processed-for example, you cannot write a sentence like this: Select year (OrderDate) as OrderYear ... WHERE orderyear > 2008.

Apply this filter

WHERE c.city = ' Madrid '

The resulting temporary table VT2 reads as follows:

C.customerid C.city O.orderid O.customerid
Fissa Madrid Null Null
Frndo Madrid 1 Frndo
Frndo Madrid 2 Frndo
Krlos Madrid 3 Krlos
Krlos Madrid 4 Krlos
Krlos Madrid 5 Krlos

In this example, you need to use on C.customerid = O.customerid filter in the ON clause, and the customer without the order is filtered out in the 1-j2 step, but is added back as the outer column in the 1-J3 step. However, since you only want to return customers from Madrid, you need to filter the city in the WHERE clause (where c.city = ' Madrid '), and if you put it in on filter, customers who do not belong to Madrid will be added back in the Add external column.

The difference between on and where is needed to explain here that the main difference between on and where is that on actually filtering before adding the outer column, where is after. On filtered columns are added back in the 1-j3. If you don't need to add an external column, then the two filters are the same.

Step 3 GROUP by clause

This clause groups the data in the temporary table generated in the previous step, and each row is divided into one group and the virtual table VT3 is generated. The VT3 table contains all the data in the VT2 table, and the grouping identifier.

This is the generated temporary table VT3 that reads as follows:

/tr>
Groups
c.customerid
c.customerid c.city o.orderid o.customerid
fissa fissa Madrid NULL NULL
frndo Frndo Madrid 1 frndo
frndo Madrid 2 Frndo
  krlos Madrid 3 krlos
Krlos Krlos Madrid 4 krlos
  krlos Madrid 5 Krlos

In the final result of SQL return, each grouping must return only one row (unless filtered), so when group by is used in an SQL statement, the clauses that are processed after group by, such as the select,having clause, can only use the columns that appear after GROUP by. For columns that do not appear after group BY, you must use aggregate functions (such as MAX, MIN,COUNT,AVG, etc.) to ensure that each group returns only one row.

Step 4 HAVING clause

The HAVING clause is used to filter the temporary tables generated in the previous step and only for the grouped data, and the group that satisfies the having condition is added to the virtual table VT4.

When this filter is applied:

Copy Code code as follows:

Having COUNT (O.orderid) < 3

After that, the resulting VT4 table reads as follows:

Groups
C.customerid
C.customerid C.city O.orderid O.customerid
Fissa Fissa Madrid Null Null
Frndo Frndo Madrid 1 Frndo
Frndo Madrid 2 Frndo

One thing to keep in mind is that it uses count (O.orderid) instead of Count (*), and because the query adds an external column, the Count method ignores the null column, causing the result that you don't want.

Step 5 SELECT clause

Although appearing at the very front of the SQL statement, the select is processed at the fifth step, and the table returned by the SELECT clause is eventually returned to the caller. This clause contains three sub stages: (5-1) evaluates expressions, (5-2) handles distinct, (5-3) applies top filter.

Step 5-1 to evaluate an expression

An expression in the SELECT clause can return or manipulate the basic columns returned in the previous table. If this SQL statement is an aggregate query, after step 3, you can use only columns in group BY, and you must use aggregation operations on columns that are not part of the group collection. The base column that is not part of the from table must have an alias for it, such as year (OrderDate) as OrderYear.

Note: aliases created in the SELECT clause cannot be used in previous step, even in the SELECT clause. The reason is that many of the operations of SQL are simultaneous operations (all at once operation), as for what is all-at-once operation here is no longer introduced. Therefore, aliases created in the SELECT clause can only be used in subsequent clauses, such as order by. For example: SELECT year (OrderDate) as OrderYear ... Order by OrderYear.

In this example:

Copy Code code as follows:

SELECT C.customerid, COUNT (O.orderid) as Numorders

The result will be a virtual table vt5-1:

C.customerid Numorders
Fifssa 0
Frndo 2

Step 5-2: Apply the DISTINCT clause

If Distinct,sql is used in the SQL statement, the duplicate columns are removed and the virtual table vt5-2 is generated.

Step 5-3: Apply Top option

The top option is a feature provided by T-SQL that represents how many rows to display. The specified number of columns is queried based on the order defined by the orders by clause. This process generates virtual table vt5-3.

As mentioned above, this step relies on the order in which the orders are defined to determine which columns should appear in the front. If you do not specify an order by sequence for the result, and you do not use the WITH TIES clause, each return result may be inconsistent.

In our example, step 5-3 was omitted because we did not use the top keyword.

Step 6:order By clause

The virtual table returned in the previous step is sorted in this step, and the cursor VC6 is returned according to the order specified by the orders by clause. The ORDER BY clause is also the only place where an alias can be created using the SELECT clause.

Note: This step differs from the previous one in that the result returned by this step is a cursor, not a table. SQL is based on set theory, a set does not define the order of his rows, it is just a logical set of members, so the order of members is not important. SQL with an ORDER BY clause returns an object that organizes each row in a specific sequence. ANSI calls such an object a cursor. Understanding this is important to your understanding of SQL.

The steps above are shown in the following illustration:

The main content of this book is to refer to Inside Microsoft SQL Server 2008:t-sql query, if you want to learn more about SQL query related knowledge, you can find this book to see, I have the English version of the PDF, I need to be able to find.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.