SQL Server subquery operation learning summary tutorial

Source: Internet
Author: User
Tags logical operators require scalar

Technical preparation

The database version is SQL Server2008R2 and is analyzed using Microsoft's simpler case Library (Northwind.

1. Independent subquery

The so-called independent subquery method means that the subquery is irrelevant to the primary query. The advantage of this method is that the subquery does not depend on the external query, so it can be evaluated for independent external queries, form your own execution plan.

For example

 

SELECT O1.OrderID, O1.Freight
FROM Orders O1
WHERE O1.Freight>
(
Select avg (O2.Freight)
FROM Orders O2
)

 

The purpose of this SQL statement is to query orders with freight rates greater than the average freight rate.

The clause used to extract the average freight here is a completely independent subquery, which is executed independently without relying on the primary query. At the same time, we use a scalar calculation (AVG) here, so a row is returned.

View the query plan of the statement:

 

 

This query plan is not well described. For the formation of the subquery execution plan, refer to my second article: SQL Server tuning series Basics (common operators)

However, you need to note that the result value (AVG) of stream aggregation and calculation scalar only contains one result value, so this statement can be executed normally.

Let's look at another situation.

 

Select o. OrderID
FROM Orders O
Where o. CustomerID =
(
Select c. CustomerID
FROM MERS C
Where c. ContactName = n' Maria Anders'
)

 

This statement is used to obtain the number of orders of customers named 'Maria Anders. The premise of whether the T-SQL statement can be executed is that there is no "'Maria Anders'" customer in the customer table, if the same name exists, the statement cannot be correctly executed, if only one customer happens to be 'Maria Anders', it can be executed normally.

Let's analyze how the SQL Server can judge whether the SQL Server can correctly execute this execution.

 

 

A new operator named assertion appears here. Let's use the text execution plan to check the main functions of this operator.

 

 

 

 

After the above analysis, we have analyzed the function of the above "assertion" operator, because our subquery statement cannot guarantee that the returned result is a row. Therefore, an asserted operator is introduced here for judgment.

Therefore, the function of assertion is to determine whether the query result of the subquery sentence meets the query requirements of the subject sentence based on the following conditions.

If the assertion finds that the sub-statement is not satisfied, an error is reported directly, for example, Expr1005> 1

 

 

In addition, the assertion operator is often used to detect whether other conditions are met, such as constraints, parameter types, and value lengths.

In fact, the asserted problem to be solved here is to judge whether the value in ContactName in our filtering conditions has repeated values. For such judgment, the relative performance consumption is relatively small, sometimes other complicated assertion operations consume a lot of resources, so we can avoid assertion operations as appropriate.

For example, the preceding statement clearly tells SQL Server that the ContactName column in The contmers table does not have duplicate values and does not need to be asserted. We create a unique and non-clustered index implementation above.

 

Create unique index ContactNameIndex on mers (ContactName)
GO
Select o. OrderID
FROM Orders O
Where o. CustomerID =
(
Select c. CustomerID
FROM MERS C
Where c. ContactName = n' Maria Anders'
)
Drop index Customers. ContactNameIndex
GO

 

 

 

After our only non-clustered index prompt, SQL Server clearly knows that our subquery statements do not return multiple rows, so the assertion operation is removed.

II. Related subqueries

Compared with the preceding independent subquery method, the subquery method is relatively complex, that is, our subquery depends on the results of the primary query. This subquery cannot be executed independently.

Let's take a look at this subquery example.

 

SELECT O1.OrderID
FROM Orders O1
WHERE O1.Freight>
(
Select avg (O2.Freight)
FROM Orders O2
WHERE O2.OrderDate

This statement returns the vertex number whose freight volume is greater than the average value in the previous order.

The statement is simple logic, but the subquery here depends on the results of the primary query, WHERE O2.OrderDate in the filter condition

Let's take a look at the execution plan of this statement.

 

 

Here, the query plan has a new operator: index off-line.

In fact, the function of index dummy offline is mainly used for the independent operation of subqueries, because we know that the subquery conditions here depend on the primary query, so, if you want to run it here, you can obtain the results of the primary query in advance, and the results of the primary query obtained here need an intermediate table for temporary storage, the tool temporarily saved here is: (Index pool) Index Spool, and the operations on this Index pool, such as new and add operations, are the "Index offline" marked above.

There are two types of indexes offline: Eager Spool and Lazy Spool. To put it simply, you do not need to store the results in Index Spool immediately or perform delayed operations.

The Index Spool is stored in the temporary database Tempdb of the system.

We use the text query plan to analyze the values of the two indexes in the dummy offline.

 

 

After the above analysis, we have seen that the Eager Spool in is a result value compared with the main query, because this must be formed in a timely manner, so that the subquery can be carried out, so its type is Eager Spool,

The Index Spool outside the subquery is Lazy Spool, and the result item is saved in a timely manner. This is the result item formed by the subquery, is the average value relative to the freight of each order.

I hope you can understand the analysis above.

In fact, the purpose of the Index Spool design is to improve performance, because we know that the results of each subquery in the preceding query statement must be called back to the master query, so in order to avoid every callback, Index Spool is used for temporary storage, and the storage location of this Index Spool is Tempdb. Therefore, the speed of running Tempdb is directly related to the performance of such query statements.

This is why we emphasize the establishment of highly concurrent databases. We recommend that you store the Tempdb database separately in a high-performance hardware environment.

Official Introduction to offline data operators in online books:

The Index Spool physical operator contains the SEEK :() predicate in the Argument column. The Index Spool operator scans its input rows and places copies of each row in hidden offline files (stored in the tempdb database and only exists during the query life, create non-clustered indexes for these rows. In this way, you can use the index search function to output only the rows that meet the predicate of SEEK.

If you repeat this operator (for example, repeat through the Nested Loops operator) without any re-binding, the offline data is used instead of scanning the input again.

Similar to offline indexing, there is also a similar operator: offline table functions are similar. Offline tables store key-value columns, while offline tables store multiple columns of data.

Let's look at the example

 

SELECT O1.OrderID, O1.Freight
FROM Orders O1
WHERE O1.Freight>
(
Select avg (O2.Freight)
FROM Orders O2
WHERE O2.CustomerID = O1.CustomerID
)

 

This query is similar to the preceding query, except that the same customer joins an order that exceeds the average freight of all orders.

This statement is also not an independent subquery statement. The results of each subquery depend on the results of the primary query. To speed up and improve performance, SQL Server saves the results of the primary table query to a temporary table. This table is called offline.

Let's look at the execution plan of this sentence:

 

 

Here we use an offline table operator. This operator is used to save the result set obtained by subsequent scans for the application of the following subqueries.

 

 

The results of offline table creation are also stored in the temporary database Tempdb. Therefore, its application is similar to the offline index mentioned above.

The preceding execution plan also mentions a new operator: Segment (Segment)

This operator is interpreted:

Segment is both a physical operator and a logical operator. It divides the input set into multiple segments based on the values of one or more columns. These columns are displayed as parameters in the Segment operator. Then, the operator outputs a segment each time.

In fact, the function is to summarize the results and aggregate the same values together, just like sorting, but here we can aggregate the values of multiple columns.

Let's take a look at an example to deepen the role of segment operations.

 

SELECT CustomerID, O1.OrderID, O1.Freight
FROM Orders O1
WHERE O1.Freight =
(
Select max (O2.Freight)
FROM Orders O2
WHERE O2.CustomerID = O1.CustomerID
)

 

This statement queries the order data of the maximum freight generated by each customer.

The above statement, if it is difficult to understand, we can work with the same logic of the T-SQL statement, the same logic

 

SELECT o1.mermerid, O1.OrderID, O1.Freight
FROM Orders O1
INNER JOIN
(
SELECT CustomerID, max (Freight) Freight
FROM Orders
Group by CustomerID
) AS O2
ON O1.CustomerID = O2.CustomerID
AND O1.Freight = O2.Freight

 

Group by customer ID, obtain the maximum freight items, and associate the table with the primary table to obtain order information.

The preceding two statements generate the same query plan:

 

 

Here we will explain that the power of SQL Server is also the best way to use segment operators.

To implement this statement, a nested loop connection is required according to the logic. The preceding method is used to obtain data offline.

However, the result item we get in this sentence is the order details item of the maximum Freight for each customer, and the CustomerID column is used as the output item, so here we use, first according to the Freight column (Freight) sort,

 

 

Then, use the segment operator to aggregate the same data of each customer, and then output the previous column (TOP 1) of each customer to obtain the order item with the largest freight for each customer.

Saves any complex operations such as table, index, and join.

SQL Server seems that this kind of intelligent operation is still quite strong.

We will analyze SQL Server's intelligent features of subquery, because after the above analysis and comparison, the related subquery statements require more consumption during running:

1. Sometimes you need to store intermediate result items through Index Spool and Table Spool, in this process, operations such as creating, adding, deleting, and destroying intermediate items require a large amount of memory and CPU resources.

2. The intermediate items mentioned above in related subqueries are all in the temporary Tempdb Database. Sometimes the space of Tempdb is increased, and the consumption of the Tempdb database and page contention are increased.

Therefore, to avoid the above problems, the best way is to avoid using related subqueries and try to use independent subqueries for operations.

Of course, SQL Server also provides the automatic conversion function to intelligently analyze statements to avoid related subquery operations:

Let's look at a slightly different way:

 

SELECT o. OrderID
FROM Orders O
WHERE EXISTS
(
SELECT c. CustomerID
FROM MERS C
Where c. City = N 'londom 'and c. CustomerID = O. CustomerID
)

 

 

 

In the preceding statement, we write related subquery operations, but a truly independent subquery is formed in the execution plan to avoid performance consumption caused by related subqueries.

In fact, the above statement is relatively well written as follows:

 

SELECT o. OrderID
FROM Orders O
Where o. CustomerID IN
(
SELECT c. CustomerID
FROM MERS C
Where c. City = n' Londom'
)

 

This forms a completely independent subquery, which is also the intention of SQL Server to execute. Therefore, the query plan formed by this statement is the same as the preceding query plan.

The optimization here is all due to the intelligence of SQL Server.

But when we write the statement, you need to understand, master, in order to write efficient T-SQL statement.

References

Logical operators and physical operators in Microsoft books online

Reference books: SQL. Server.2005. technical insider series

Conclusion

This article is a bit long, but the content of the subquery is not comprehensive. In the future, we will add that many of the SQL statements we write involve subqueries, therefore, this application is quite common. To this article on the daily optimization of the T-SQL query statements often used in some operators of the basic introduction, of course, there are some other addition, deletion and modification of a series of operators, in our daily life, we generally do not use query plan optimization. In the future, we will add these operators for reference.

After completing the optimization related to the query plan in this series, I plan to make a detailed analysis on the statistical information of the database. Because statistical information is the most important decision point supporting SQL Server to evaluate the optimal execution plan,

Therefore, the importance of statistical information is self-evident. If you are interested, follow these steps in advance.

SQL Server Performance tuning involves a wide range of content, which will be analyzed in subsequent articles.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.