SQL Server Tuning Series Basics (subquery operations Summary)

Last Update:2014-12-14 Source: Internet

Author: User

Tags logical operators scalar what sql

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective

In the previous articles, we introduced a series of descriptions of operators and the optimization methods and techniques for each operator. This includes a series of our common operators, including how to view execution plans, how several datasets are commonly used, how to join operators, and parallel operators. Interested children's shoes can be clicked to view.

In this article we introduce a series of sub-query statements, sub-query is generally the formation of some of the basic operations of complex queries, so the application of sub-query is very important.

Talk less, start the subject of this article.

Technical preparation

The database version is SQL SERVER2008R2 and is analyzed using a more concise case library (Northwind) from Microsoft.

One, independent sub-query mode

The so-called independent sub-query method, that is, the subquery and the main query has no correlation, the benefit is that the subquery does not depend on the external query, so it can be evaluated independently of the external query, to form their own execution plan execution.

As an example,

SELECT O1. Orderid,o1. Freightfrom Orders o1where O1. Freight>(   SELECT AVG. O2. Freight) from   Orders O2)

The goal of SQL execution is to query orders that have more than the average shipping charge in the order.

The clause that extracts the average freight here is a completely independent subquery that executes independently of the main query. At the same time here we take advantage of a scalar computation (AVG) so that it returns exactly one row.

Look at the query plan for the statement:

This query plan does not have much to introduce, about the sub-query execution plan formation can refer to my second article: SQL Server Tuning Series Basics (Summary of common operators)

However, it is important to note that the result value (AVG) of flow aggregation and the calculation of scalar formation contains only one result value, so the statement can execute normally.

Let's look at another situation.

Select O.orderidfrom Orders owhere o.customerid=(   select C.customerid from   Customers C   WHERE C.contactname=n'Maria Anders')

The statement also gets the number of orders for a customer named ' Maria Anders '. The premise of this T-SQL statement can be executed is that there is no "Maria Anders" customer with the same name in the customer's table, and if there is a situation with the same name, the statement will not execute correctly, and if only one customer is ' Maria Anders ', it will execute normally.

Let's analyze how SQL Server can judge when this execution is performed correctly.

A new operator appears here with the name: assertion. Let's take a look at the text execution plan to see what the main function of this operator is

Through the above analysis, we have analyzed the function of the "assertion" operator above, because our subquery does not guarantee that the result returned is a row, so here we introduce an assertion operator to make a judgment.

Therefore, the function of the assertion is to determine whether the query result of a subquery clause satisfies the query request of the main statement according to the conditions below.

If, the assertion found that the child statement is not satisfied, will be directly error, such as the above expr1005>1

Also, the Assert operator is often used to detect whether other conditions are satisfied, such as constraints, parameter types, value lengths, and so on.

In fact, here the assertion to solve the problem is to determine whether the value of the ContactName in our filter condition has duplicate values, for this judgment relative performance consumption is relatively small, sometimes for other complex assertion operations need to consume a lot of resources, Therefore, we can avoid the assertion operation according to the appropriate situation.

For example, the above statement can clearly tell SQL Server that there is no duplicate value in the ContactName column in table customers, and it does not need to be asserted. We build one on top: unique, nonclustered index implementations

CREATE UNIQUE INDEX contactnameindex on Customers (ContactName) goselect o.orderidfrom Orders owhere O.customerid =(   SELECT c.customerid from   Customers C   WHERE c.contactname=n'Maria Anders  ') drop index Customers.contactnameindexgo

With the hint of our only nonclustered index, SQL Server has made it clear that our subquery does not return multiple rows, so we have eliminated the assertion operation.

Second, the related sub-query method

Compared to the above independent sub-query method, here the relative sub-query method is relatively complex point, is that our sub-query depends on the results of the main query, for such a subquery can not be executed alone.

Let's take a look at such a subquery example

SELECT O1. Orderidfrom Orders o1where O1. Freight>(   SELECT AVG. O2. Freight) from   Orders O2   WHERE O2. OrderDate<O1. OrderDate)

This statement is the vertex number that returns the freight quantity above the average in the previous order.

The statement is simple logic, but the subquery in this is dependent on the result item of the main query, where O2 in the filter condition. Orderdate<o1. OrderDate, so this subquery cannot be run on its own.

Let's take a look at the execution plan for this statement

A new operator appears in the query plan here: Index spool.

In fact, the role of the index spool is mainly used for the independent operation of the subquery, because we know that the query condition of the subquery here is dependent on the main query, so, here you want to run the first to get the result of the main query, and here to get the main query result items need an intermediate table to be staged, The temporary tool is: Index Spool, and the operation of this index pool, such as: new, add, etc. is the above we labeled "Index Spool".

There are two types of index spooling: Eager Spool and Lazy Spool, in fact, the simple point is that you need not immediately put the results into the index Spool, or through the delay operation.

The indexed Pool (index Spool) that is formed here is stored in the system's temporary library tempdb.

We use a text query plan to analyze what the value of the next two index spool is

After the above analysis, we have seen that the inside of the eager Spool is the result of the comparison with the main query, because this must be formed in a timely manner in order to facilitate the sub-query, so its type is eager Spool,

And the subquery outside the index Spool for the lazy Spool, the result of the preservation of the item does not need to be so timely, the preservation of the sub-query is the result of the formation of the item, is relative to the average cost per order.

I above the analysis, I hope you crossing can read it.

In fact, the purpose of this index spool design, completely in order to improve performance, because we know the above query statement each subquery, we must callback the results of the main query, so in order to avoid each callback, the use of the index spool for staging, and this index The location of the spool storage is tempdb, so the speed at which tempdb runs is directly related to the performance of this query statement.

This is why we emphasize large concurrent database construction, and we recommend that the tempdb library be stored separately in high-performance hardware environments.

The official description of the spool data operator in the online books:

The Index Spool physical operator contains the SEEK: () verb in the Argument column. The Index Spool operator scans its input rows, places a copy of each row in the hidden spool file (stored in the tempdb database and exists only for the lifetime of the query), and creates a nonclustered index for those rows. This allows you to use the lookup function of the index to output only those rows that satisfy the SEEK: () predicate.

If you re-orbit the operator (for example, by Nested Loops operator) but do not require any rebinding, the spool data will be used without rescanning the input.

Similar to the index offline, there is a similar operator: The table is offline, its function is similar, the table is stored offline should be a key value column, and the table offline is the storage of multi-column data.

See Example

SELECT O1. Orderid,o1. Freightfrom Orders o1where O1. Freight>(   SELECT AVG. O2. Freight) from   Orders O2   WHERE O2. CustomerID=O1. CustomerID)

This query is similar to the above, except that it is an order that the same customer joins more than the average of all order shipping.

This statement is also not a separate subquery statement, the result of each subquery needs to rely on the results of the main query, in order to speed up, improve performance, SQL Server will be the main Table query result items are staged into a temporary table, this table is known as the table offline

Let's take a look at the execution plan of this sentence:

Here you use a table-offline operator that is used to stage the resulting collection of subsequent scans for the application of the following sub-queries

The result item that this table is formed offline is also stored in the temporary library tempdb, so its application is similar to the index mentioned earlier.

The above execution plan also mentions a new operator: segment (Segment)

The explanation for this operator is:
Segment is both a physical operator and a logical operator. It divides the input set into multiple segments based on the values of one or more columns. These columns are displayed as parameters in the Segment operator. The operator then outputs one segment at a time.

In fact, the effect is to summarize the results, the same values together, as in the sort, but here can be a multi-column value aggregation.

Let's look at an example to deepen the role of the segment operation

SELECT Customerid,o1. Orderid,o1. Freightfrom Orders o1where O1. Freight=(   SELECT MAX (O2). Freight) from   Orders O2   WHERE O2. CustomerID=O1. CustomerID)

This statement queries the order data for the maximum shipping cost generated by each customer.

The above statement, if it is difficult to understand, we can work with the same logical T-SQL statements, the same logic

SELECT O1. Customerid,o1. Orderid,o1. Freight from Orders O1inner joins (    SELECT Customerid,max (Freight) Freight from    orders    GROUP by CustomerID) as O2on O1. CustomerID=O2. Customeridand O1. Freight=o2. Freight

First, according to the customer number group, and then get the largest freight items, and then associated with the main table to obtain order information.

The same query plan generated by the above two statements:

Here we explain that the power of SQL Server is also the best way to use the segment operator.

Originally this sentence to achieve, according to the logic need to have a nested loop connection, referring to the above way, using the table offline method for data acquisition.

However, our sentence gets the result item is the order detail item for each customer's maximum shipping cost, and the CustomerID column as the output item, so this is used here, first sorted by freight column (Freight),

Then use the segment operator to bring together the same data for each customer, and then output the previous column (top 1) for each customer to get the most cost-per-customer order item.

A series of complex operations, such as table spooling, index spooling, and associated connections, are eliminated.

SQL Server seems to be a very intelligent operation.

Let's analyze SQL Server's intelligence about subqueries, because, by contrast, the related subquery needs more consumption at run time:

1, sometimes need to through the Index spool (index Spool), the table offline (tables Spool) for intermediate results of the staging, and this process of intermediate items need to create, add, delete, destroy and so on need to consume a lot of memory and CPU

2, about the related sub-query in the above mentioned in the middle of the formation are located in the tempdb temporary library, sometimes will increase the tempdb space, increase the tempdb library consumption, page contention and so on.

Therefore, to avoid the above problem, the best way is to avoid the use of related subqueries, try to use a separate subquery to operate.

Of course, SQL Server also provides the ability to automatically convert and intelligently parse statements to avoid the associated subquery operations:

Look at a slightly worse notation:

Select O.orderid from Orders owhere EXISTS (   select C.customerid   from  Customers c   WHERE c.city =n'londom' and c.customerid=O.customerid)

In the above statement, we are writing about the related subquery operations, but the actual sub-queries formed in the execution plan, thus avoiding the performance cost associated with the subquery.

In fact, the above statement, the relatively good wording is as follows

Select O.orderid from Orders owhere O.customerid in (   SELECT c.customerid   from  Customers c   WHERE c. City =n'londom')

This creates a completely separate subquery, which is what SQL Server intends to do. So the query plan formed by this statement is the same as the query plan above.

The optimizations here all benefit from the intelligence of SQL Server.

But when we write statements, we need to understand and master them so that we can write efficient T-SQL statements.

Reference documents

Microsoft Books Online logical operators and physical operator references
Refer to the book SQL. server.2005. Technical Insider "series

Conclusion

This article is a bit long, but the introduction of sub-query content is not very full, follow-up slowly add on, we write a lot of SQL statements are related to sub-query, so this application is quite common. To this article about the daily tuning of the query in T-SQL frequently used by some of the operators basically introduced all, of course, there are some other additions and deletions to change a series of operators, these daily life we generally do not use query plan tuning, subsequent our article will add these operators, for reference.

After completing this series of query plan-related tuning, I intend to do a detailed analysis of the database of statistical information. Because statistics are the most important decision point to support SQL Server in evaluating an optimal execution plan,

So the importance of statistical information is self-evident. Interested children's shoes can be noticed in advance.

The content of SQL Server performance tuning involves a wide range of topics, followed by an analysis in the following article.

If you have any questions, you can leave a message or private messages, and look forward to an in-depth study of your child's shoes with SQL Server. Learn together and progress together.

At the end of the article to give a few links, it seems necessary to sort out a catalogue .....

SQL Server Tuning Series Basics

SQL Server Tuning Series Basics (Summary of common operators)

SQL Server Tuning Series Basics (Union operator summary)

SQL Server Tuning Series basics (Parallel operations Summary)

SQL Server Tuning Series basics (Parallel operations Summary chapter II)

SQL Server Tuning Series Basics (Index operations Summary)

If you read this blog, feel that you have something to gain, please do not skimp on your " recommendation ".

SQL Server Tuning Series Basics (subquery operations Summary)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More