Issues and workarounds that may be caused by the TOP clause in SQL Server

Last Update:2016-01-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Brief introduction

In SQL Server, using the TOP clause for complex queries can have performance implications, which can be good or bad, with different possibilities for different situations.

The SQL statement in a relational database is only an abstract concept and does not contain any logic. Many of the metadata affects the build of the execution plan, and the SQL statement itself is not the metadata referenced by the generation of the execution plan (except for the hint), but the top keyword is a keyword that directly affects the execution plan, so using top in some cases can cause performance impact, so let's look at different scenarios.

Single-table case

For a single-table query (where the single table refers to a physical single table that does not contain views, table-valued functions), there is no performance impact on top, and if you add top to SQL Server, top itself can be considered a query hint, meaning that the optimizer "returns only n rows ”。 Let's look at a simple example, as shown in 1:

Figure 1: Specify a single-table execution plan for the top keyword

As you can see from the execution plan comparison in Figure 1, the use of the top clause can often improve performance for single-table queries with index support, when n of the number of rows in top N prompts the query optimizer to return n rows instead of using the data distribution in the statistics, where top n is reasonable for the query optimizer.

However, there are times when grant memory (which estimates the required memory each time the plan is executed, and if the estimated memory is less than the execution memory, it will spill to tempdb, which has a very large impact on performance, because the formula for each version predicts memory varies greatly, This is not explained in detail here) and is not allowed to have a very high performance impact. Before we start to talk about this, let's talk about the two operators first:

Sort

The sort operator is a very general sort operator and may appear in many places in the execution plan, such as before the merge join, due to order by. The algorithm is very generic and can be sorted on very large result sets, which are blocking (meaning that data cannot flow to the next operator until the end of the sort) and require a lot of memory and CPU resources. Another problem with this operator is that when the grant memory is insufficient, tempdb secondary is required to complete the sort, so there is a significant performance overhead.

Top N Sort

TOP N Sort is a sort algorithm that adapts to small scenes, specifically for a small number of queries. For the selection of only a few data, the cost of sorting the entire result set is too high, so the algorithm of top n is first to take the first data, compared with other data to see if the maximum (or minimum), then take the second data comparison, and so on, until the first n data is found. This algorithm is significantly better than the sort operator if the number of rows is small, but if the n value is too large, the algorithm is not appropriate for the following reasons:

1. The algorithm does not support spill to tempdb, resulting in the inability to host too large result sets.

2. The algorithm needs to traverse n times, if n is too large, the cost is too high.

For SQL Server, the threshold value for this n is too large is 100. Let's look at an example where the test data and code are shown in Listing 1.

CREATE TABLE Testtop

INT INT CHAR (1000))

  DECLARE INT =1

   while @i<300000

  BEGIN

   into VALUES (@i,@i,' a ')

  SET @[email protected]+1

END

  CREATE CLUSTERED INDEX  on Testtop (ID)

  --test 1

  SELECT TOP  from Testtop

  ORDER  by SortKey

  --test 2

  SELECT TOP  from Testtop

  ORDER  by SortKey

Code Listing 1. Test data and test code

The first test is top 100, just the top N sort algorithm, the second test is top 101, only the normal sort algorithm is used, and 2 is shown.

Figure 2. TOP 101 Sort requires more memory, resulting in insufficient memory grant spill to tempdb

Let's look at the execution time, as shown in execution time 3, due to the presence of spill to tempdb.

Figure 3: Very large difference in execution time

As can be seen from Figure 3, the execution time varies greatly.

Therefore, for the use of top, try to use the values within the top 100.

Multi-table case

Because the top statement has hints for optimizer cardinality estimates, multiple table queries can lead to an underestimation of the number of rows in extreme cases and thus affect performance.

For example, the following 4 sample queries

Figure 4: Query using the top 1 table

In this case, because of the existence of TOP1, the query optimizer uses 1 as the estimated number of rows, and the actual number of rows varies greatly, so for this case, using top can result in higher costs (although we see the estimated 0% vs 100% in Figure 4, but the actual difference is huge), as shown in cost 5.

Figure 5: Using top instead results in performance degradation

For the above scenario, we can usually have the following centralized workaround:

1. Using hints, since we know this is due to the fact that the actual number of rows is much larger than the estimated number of rows, we can try using a hash join,forcescan.

2. Increase the Where condition so that the number of rows returned has a higher selectivity.

3. Do not use TOP1, and use the number above the top 10 to make the estimated number of rows larger, compared to 5 of the query we changed from TOP1 to TOP10, then the execution plan becomes 6.

Figure 6. Implementation plan for TOP 10

This is due to the fact that LOOP join can return a limited number of rows more quickly when the number of rows is low, which is equivalent to a fast n hint to the table, but when the number of rows increases, the optimizer prefers to use merge or hash to complete the operation, with better performance in extreme cases where there are more rows returned (low selectivity). As shown in result 7.

Figure 7: TOP10 has better performance than TOP1 in special cases.

therefore, in conjunction with the single-table example, the top keyword is recommended when the number is between 10 and 100.

Summary

This article describes the possible impact of the top keyword on the execution plan under single table and multi-table conditions, which in turn affects the query plan. Top impact execution plans are mainly in the following two areas:

Memory Grant
Estimated number of rows

Therefore, in the special case, the top statement can be considered according to the actual situation of the recommendations of this article.

Issues and workarounds that may be caused by the TOP clause in SQL Server

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Issues and workarounds that may be caused by the TOP clause in SQL Server

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Issues and workarounds that may be caused by the TOP clause in SQL Server

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support