Analysis of index operations for SQL Server Performance Optimization

Source: Internet
Author: User
Tags constant datetime join logical operators scalar

Technical preparation

Based on SQL Server2008R2, Microsoft uses a simpler case Library (Northwind) for parsing.

Introduction

The so-called index application is in our daily write T-SQL statement, how to use the existing index, then analysis is what we write query conditions, in fact, most of the situations are nothing more than the following:

1. Equal to the predicate: select... where... column = @ parameter
2. Comparison predicates: select... where... column> or <= or> = @ parameter
3. Range predicate: select... where... column in or not in or between and @ parameter
4. Logical predicates: select... where... a predicate or, and other predicates or, and more predicates ....

We will analyze in sequence how to use indexes for query optimization in the above situations

I. Dynamic index search

The so-called dynamic index search means that when SQL Server executes a statement, it formats the query conditions and automatically matches the index items based on different query conditions to improve performance.

Here is an example.

SET SHOWPLAN_TEXT ON
GO
SELECT OrderID
FROM Orders
WHERE ShipPostalCode IN (n'000000', n'000000 ')

 

 

Because we have created a non-clustered index column in The ShipPostalCode column of the table Orders, the query plan here uses the index search method. This is where you need to create an index.

We can use the text method to view the detailed execution plan script of the statement. The statement is long. I use notepad to wrap and format it.

 

 

We know that this table has a non-clustered index in this column, so we should try our best to use it during Query. If we use index scanning, the cost will be higher, therefore, SQL Server tries its best to adopt the index search method. IN fact, the IN keyword and the OR keyword logic are the same.

The preceding query conditions are converted:

[Northwind]. [dbo]. [Orders]. [ShipPostalCode] = n'000000'

OR

[Northwind]. [dbo]. [Orders]. [ShipPostalCode] = n'000000'

In this way, you can use an index to search for the first result and then the second result. This process is called dynamic index search in SQL Server.

Is it smart ....

So sometimes we try to use SQL Server intelligence when writing statements, so that it can automatically find the index and improve performance.

Sometimes the statements we write make SQL Server Smart. For example:

-- Parameterized query conditions

DECLARE @ Parameter1 NVARCHAR (20), @ Parameter2 NVARCHAR (20)

SELECT @ Parameter1 = N '000000', @ Parameter2 = N '000000'

SELECT OrderID

FROM Orders

WHERE ShipPostalCode IN (@ Parameter1, @ Parameter2)

We change these two static screening values to parameters. Sometimes the stored procedures we write often prefer this! Let's look at the query plan generated in this way.

 

 

This is a simple execution plan for non-clustered index search. We only use two variables instead of directly writing these two values into the IN keyword.

Look at the query plan generated by SQL Server above! Nima... what are these ??? When nested loops are used, I query an Orders table... what are your nested loops... where is the capability of dynamic index search above ???

Okay, let's use the text query plan to check what this simple statement is doing...

 

| -- Nested Loops (Inner Join, outer references :( [Expr1009], [Expr1010], [Expr1011])

| -- Merge Interval

| -- Sort (TOP 2, order by :( [Expr1012] DESC, [Expr1013] ASC, [Expr1009] ASC, [Expr1014] DESC ))

| -- Compute Scalar (DEFINE :( [Expr1012] = (4) & [Expr1011]) = (4) and null = [Expr1009], [Expr1013] = (4) & [Expr1011], [Expr1014] = (16) & [Expr1011])

| -- Concatenation

| -- Compute Scalar (DEFINE :( [Expr1004] = [@ Parameter2], [Expr1005] = [@ Parameter2], [Expr1003] = (62 )))

| -- Constant Scan

| -- Compute Scalar (DEFINE :( [Expr1007] = [@ Parameter1], [Expr1008] = [@ Parameter1], [Expr1006] = (62 )))

| -- Constant Scan

| -- Index Seek (OBJECT :( [Northwind]. [dbo]. [Orders]. [ShipPostalCode]), SEEK :( [Northwind]. [dbo]. [Orders]. [ShipPostalCode]> [Expr1009] AND [Northwind]. [dbo]. [Orders]. [ShipPostalCode] <[Expr1010]) ordered forward)

 

It's complicated, right? In fact, I analyzed the script. The reason for generating this plan script is to solve the following problems:

1. The script we wrote previously writes two constant values IN and has different values. Therefore, it forms a combination of two index values through the OR keyword,

This method seems to be okay, but we have changed these two values into parameters, which leads to a new problem. If the two parameters are the same, the preceding execution plan will generate the following

[Northwind]. [dbo]. [Orders]. [ShipPostalCode] = n'000000'

OR

[Northwind]. [dbo]. [Orders]. [ShipPostalCode] = n'000000'

In this way, the output result is two identical output values !... But the table does have only one piece of data... so the output result is incorrect!

So the first problem solved after the parameter is de-duplicated, and the two are the same as one.

2. The above is a parameter, and another problem is introduced. We add one of the two values to be passed in as Null, or both of them are Null values. The same output result is faced with this problem. So here we need to solve the problem of removing the Null value.

To solve the problem above, let's roughly analyze the execution plan to see how SQL Server solves the problem.

 

 

A simple point is to scan the values in the variables, summarize the content, sort the content, and remove the repeated values in the parameters. In this way, the obtained value is a correct value, finally, take the de-duplicated parameter values into the nested loop and perform index search with the table Orders.

However, I did not understand a problem during the analysis, that is, the best aggregate value of constants after deduplication, used to nest loop connections, in the following index search, the filter condition is changed to and search.

 

 

I will sort out the final index search conditions as follows:

| -- Index Seek (OBJECT :( [Northwind]. [dbo]. [Orders]. [ShipPostalCode]), SEEK:

(

[Northwind]. [dbo]. [Orders]. [ShipPostalCode]> [Expr1009]

AND

[Northwind]. [dbo]. [Orders]. [ShipPostalCode] <[Expr1010]

) Ordered forward)

How did this happen? I haven't figured it out yet. I also hope I can see some instructions on my shoes ....

Okay. Let's continue.

The preceding execution plan mentions a new operator: merge interval (merge interval operator)

Let's analyze the role of this operator. In fact, we have already marked the role of this operator in the execution plan diagram above, removing repeated values.

In fact, there are many deduplication operations, such as the various deduplication operations we mentioned in the previous article.

How can we see a merge interval deduplication? In fact, the reason is very simple, because we have sorted the results before using this operator, and the duplicate values of the sorted results items are closely tied together, therefore, the merge interval method is introduced for processing, so the performance is the best.

More importantly, the application scenario of the merge interval operator is not only limited to the removal of duplicate values, but more importantly, it is also applied to the removal of repeated intervals.

Let's take a look at the example below

 

-- Parameterized query conditions

DECLARE @ Parameter1 DATETIME, @ Parameter2 DATETIME

SELECT @ Parameter1 = '2014-01-01 ', @ Parameter2 = '2014-01-04'

SELECT OrderID

FROM ORDERS

WHERE OrderDate BETWEEN @ Parameter1 and dateadd (DAY, 6, @ Parameter1)

OR OrderDate BETWEEN @ Parameter2 and dateadd (DAY, 6, @ Parameter2)

 

Let's take a look at the generated query plan items.

 

 

We can see that the query plan generated by SQL Server for us is exactly the same as the statements we wrote earlier. Of course, our statements have not been changed much, and the change is about the query conditions.

Let's analyze the query conditions:

WHERE OrderDate BETWEEN @ Parameter1 and dateadd (DAY, 6, @ Parameter1)

OR OrderDate BETWEEN @ Parameter2 and dateadd (DAY, 6, @ Parameter2)

A simple filter condition is used to obtain the value from January 1, to January 7, or from January 1, to January 10, (excluding the start date)

The logical predicates used here are: OR... which is equivalent to the IN

But here we will analyze it again and you will find that the two time periods overlap.

 

 

If you use the previous direct index to search for the repeated range value, the range value searched in this range is repeated. To avoid this problem, SQL Server introduces the "merge interval" operator.

In fact, after the above analysis, we have analyzed the advantages and disadvantages of such dynamic index search. Sometimes we want to avoid this complicated execution plan generation, the simplest way to use it is to directly pass the value into the statement (re-compile is required here). Of course, in most cases, the program we write is only defined parameters and then perform the operation. The trouble may be caused by the above problem. Of course, sometimes there are too many parameters, and the memory consumption will increase in order to merge the sorting applied at the interval. How to use it, and analyze it as needed.

2. Index Union

The so-called index Union is to split the indexes into different conditions based on different filtering conditions to match different index items.

For example

SELECT OrderID

FROM ORDERS

WHERE OrderDate BETWEEN '2017-01-01 'AND '2017-01-07'

OR ShippedDate BETWEEN '2017-01-01 'AND '2017-01-07'

This code is used to query the order date in the order from January 1-19, 1998 to January 7, or the delivery date is also from January 1-19, 1998 to January 7.

The logic is very simple. We know that both fields in this table have index items. Therefore, this query has two options in SQL Server:

1. One-time index scan is output based on matching results. This is simple and effective. However, if the order table has a large amount of data, the performance will be poor, because most of the data is not what we want at all, it is a waste of time to scan.

2. You can use the index fields of the two columns to directly find and obtain the data. This can directly reduce the scanning volume of the data table, but the problem is that if the data is scanned separately, some of the data is duplicated: the orders that are placed between January 1-19, 1998 and January 7, and the shipping date is also within this period, because both scanning items include, therefore, you need to remove this part of repeated data when outputting the data again.

Let's take a look at how to choose SQL Server

 

 

It seems that SQL Server has selected the 2nd method after evaluation. However, the above method is not perfect, and deduplicated operations consume 64% of the resources.

In fact, based on the generated query plan, we can use the following logic flexibly. The effect is the same as that of the preceding statement, and the generated query plan is the same.

 

SELECT OrderID

FROM ORDERS

WHERE OrderDate BETWEEN '2017-01-01 'AND '2017-01-07'

UNION

SELECT OrderID

FROM ORDERS

WHERE ShippedDate BETWEEN '2017-01-01 'AND '2017-01-07'

 

 

 

 

Let's look at an example of index Union.

SELECT OrderID

FROM ORDERS

WHERE OrderDate = '2017-01-01'

OR ShippedDate = '2017-01-01'

We will change the preceding Between and inequality filter conditions into equality filter conditions. Let's take a look at the execution plan formed in this way.

 

 

The statement is basically the same, but we have changed different query conditions, but the generated query plan is still quite changed. There are several differences:

1. Use... and filtering conditions are combined by index search return values in series. The so-called concatenation means that the two datasets are pieced together, and there is no matter what order is connected.

2. Use... and filtering condition. The result set de-duplicated by concatenating is sorted de-duplicated (Sort Distinct )... it consumes a lot of resources. Here we use stream aggregation to do this, which is basically not consumed.

We will analyze the reasons for the following two differences:

First, the filtering condition is changed to equality join. The results produced by index search are sorted by the OrderID column we want to query, therefore, when the two datasets are summarized, it is suitable for merging the join conditions! Sort in advance. Therefore, the optimal method here is to use merged connections!

So we use... and filtering conditions are also sorted by index search results, but here it is not sorted by OrderID, it is sorted by OrderDate or ShippedDate column, our result is the OrderID column, so the sorting here is useless ...... therefore, SQL Server can select only one serial operation to aggregate the results, and then sort the results ...... I hope I have already understood this...

Second, there is no doubt that we use the stream aggregation (Aggregate) method for deduplication, which is the best, consumes less memory and is fast... but the premise is to sort it in advance... the previously selected sorting deduplication (Sort Distinct) is helpless...

Conclusion: When writing a statement, we can determine it as an equality join. It is best to use an equality join. In addition, if the output conditions can be determined, it is best to write the data to avoid unnecessary bookmarking searches, and there is also an evil SELEECT *....

If you write a SELECT * statement, the statements you write can basically say goodbye to non-clustered index searches... at most, clustered index scans or RID searches...

Refer to the following statements

SELECT *

FROM ORDERS

WHERE OrderDate = '2017-01-01'

OR ShippedDate = '2017-01-01'

 

 

Finally, we provide an AND join predicate operation method. This method is called index crossover, which means that if two or more filtering conditions use cross indexes, then you can use one to query.

Let's see a statement.

SELECT OrderID

FROM ORDERS

WHERE OrderDate = '2017-01-01'

AND ShippedDate = '2017-03-05'

 

 

Here we use the AND join method for the predicates, so in actual execution, although both columns have non-clustered indexes, they can be used theoretically, however, we only need to select an optimal index for search, and the other can be directly searched using bookmarks. Saves all sorts of Shenma sorting de-duplication... stream aggregation de-duplication... and so on.

It seems that the AND connector is a very handsome operator... so many times we try to write OR, it is better to switch to AND more efficiently.

References

Logical operators and physical operators in Microsoft books online

Reference books: SQL. Server.2005. technical insider series

Conclusion

This article mainly introduces some indexing methods. It mainly describes the methods that we usually use when writing statements. In addition, it can be regarded as an example, in fact, the statements we usually write are nothing more than the changes and patchwork of various methods described in this article. Based on this, how to create an index is also a guiding item.

In the next article, we will introduce a series of sub-queries. If you are interested, please pay attention to them in advance. SQL Server performance tuning involves a wide range of content, which will be analyzed in sequence in subsequent articles.

If you have any questions, you can leave a message or send a private message. We are always waiting for you to join SQL SERVER for further research. Learn together and make progress together.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.