SQL Server Tuning Series Basics (Index operations Summary)

Source: Internet
Author: User
Tags logical operators scalar

Objective

Previous articles we introduced how to view the query plan, the introduction of common operators, the way of parallel operations, interested can click to view.

This article will analyze in SQL Server, how to use the first indexed items for query performance optimization, by understanding how these index entries can be used to guide us how to build indexes, adjust our query statements, to achieve performance optimization.

Gossip less, into the topic of this article.

Technical preparation

Based on the SQL SERVER2008R2 version, a more concise case library (Northwind) from Microsoft was used for parsing.

Brief introduction

The so-called index application is in our daily write T-SQL statement, how to use the existing index entries, and then analysis of the words we write the query conditions, in fact, most of the situation is nothing more than the following:

1, equals predicate: SELECT [email protected]

2. Comparison predicate: Select ...where...column> or < or <> or <= or >= @parameter

3. Range predicate: Select ... where...column in or not in or between and @parameter

4. Logical predicate: Select ... One predicate or, and other predicates or, and more predicates ....

We analyze in turn the above several cases, how to use the Index for query optimization

First, dynamic index lookup

The so-called Dynamic index lookup is SQL Server in the execution of the statement, only to format the query criteria, and then according to the different query conditions automatically match the index entries, to achieve performance improvement purposes.

To give an example.

SET showplan_text ongoselect orderidfrom orderswhere shippostalcode in (n ' 05022 ', n ' 99362 ')

Because we established nonclustered index columns in the column Shippostalcode column of the table orders, the plan for querying here takes advantage of the way the index is found. This is where you need to build an index.

Let's use the text to see the detailed execution plan script for the statement, the statement is long, I wrap it in Notepad, format to view

We know that there is a nonclustered index in this column of this table, so use it as much as possible when querying, and if it is consumed by index scanning, then SQL Server tries to take the index lookup as much as possible, in fact the in keyword and the OR keyword logic are the same.

The above query conditions are then converted into:

[Northwind]. [dbo]. [Orders]. [Shippostalcode]=n ' 05022 '
OR
[Northwind]. [dbo]. [Orders]. [Shippostalcode]=n ' 99362 '

This allows you to find the index, find the first result, and then look for the second one, which is called in SQL Server: Dynamic index lookup.

is not a little bit of intelligent feeling ....

So sometimes when we write statements, we try to use this intelligence of SQL Server so that it can automatically find indexes and improve performance.

Sometimes we write statements that let SQL Server's intelligence disappear, for example:

--Parametric query conditions declare @Parameter1 NVARCHAR, @Parameter2 NVARCHAR SELECT @Parameter1 =n ' 05022 ', @Parameter2 =n ' 99362 ' SELECT orderidfrom orderswhere Shippostalcode in (@Parameter1, @Parameter2)

We change these two static sieve sequence values into parameters, and sometimes we write stored procedures that are often like this! Let's look at the generated query plan in this way

A very simple non-clustered index looking for a finished execution plan, we just did not write these two values directly in the In keyword, but instead took advantage of two variables.

Look at the SQL Server-generated query plan above! Nima... What's all this??? Also used to nest loops, I queried an Orders table ... What do you nest in a loop ...? What is the ability to find the dynamic index above???

Well, let's use the text query plan to see what the simple statement is doing ...

 |--nested Loops (Inner Join, OUTER REFERENCES: ([Expr1009], [Expr1010], [Expr1011])) |--merge Interval |         |--sort (TOP 2, ORDER by: ([Expr1012] desc, [Expr1013] ASC, [Expr1009] ASC, [EXPR1014] desc)) | |--compute Scalar (DEFINE: ([Expr1012]= ((4) &[expr1011]) = (4) and NULL = [Expr1009], [expr1013]= (4) &[expr1011], [              expr1014]= (+) &[expr1011]) |                   |--concatenation |                   |--compute Scalar (DEFINE: ([expr1004]=[@Parameter2], [expr1005]=[@Parameter2], [expr1003]= (62))) |    |                   |--constant Scan |                        |--compute Scalar (DEFINE: ([expr1007]=[@Parameter1], [expr1008]=[@Parameter1], [expr1006]= (62))) | |--constant Scan |--index Seek (OBJECT: ([northwind].[ DBO]. [Orders]. [Shippostalcode]), SEEK: ([Northwind]. [dbo]. [Orders]. [Shippostalcode] > [Expr1009] and [Northwind]. [dbo]. [Orders]. [Shippostalcode] < [Expr1010]) ORDERED FORWARD) 

It's complicated, I actually analyzed the script, the reason why the script was generated, is to solve the following problems:


1, the previous script we wrote in Inside is written in the two constant value, and is a different value, so the formation of two index value of the search by the OR keyword combination,

This approach seems to be fine, but we turn these two values into parameters, which leads to a new problem, and if the two parameters we enter are equal, then using the previous execution plan will generate the following

[Northwind]. [dbo]. [Orders]. [Shippostalcode]=n ' 05022 '
OR
[Northwind]. [dbo]. [Orders]. [Shippostalcode]=n ' 05022 '

The output generated by this execution is the same as 2 output values! ... But there's really only 1 data in the table ... So the output result is incorrect!

So the first problem to solve after becoming a parameter is to go to the weight of the problem, 2 of the same becomes 1.

2, the above becomes the parameter, also introduced another question, joins us two values to have an incoming null value, or two All is the null value, the same output result faces this kind of problem. So here's the problem of going to the null value.

To solve the above problem, let's take a rough look at the execution plan and see how SQL Server solves the problem.

The simple point is to scan the value in the variable, then summarize the value, then sort it, then remove the duplicate value from the parameter so that the obtained value is a correct value, and finally take the values of these deduplication parameters into the nested loop, and the table orders to index the lookup.

But in the process of analysis, there is a problem I did not understand, is the best after going to the weight of the constant summary value, used to nest the loop connection, when the following index to find the filter condition becomes and find

I will look up the last index above the criteria, organized as follows:

|--index Seek (OBJECT: ([northwind].[ DBO]. [Orders]. [Shippostalcode]), SEEK:
(
[Northwind]. [dbo]. [Orders]. [Shippostalcode] > [Expr1009]
and
[Northwind]. [dbo]. [Orders]. [Shippostalcode] < [Expr1010]

) ORDERED FORWARD)

What's wrong with this place? I did not know, but also look at the children's shoes to see the little guidance ....

All right, let's go.

In the above execution plan, a new operator is mentioned: merge interval (interval operator)

Let's analyze the function of this operator, in fact, we have shown in the diagram of the execution plan the function of the operator, remove the duplicate value.

In fact, there are a lot of operations on the heavy, such as the previous article mentioned in the various de-re-operation.

How come there's a merge interval to weigh in here? The reason is simple because we have sorted the results before using this operator, and the ordered result items repeat values are tightly bound together, so we introduce a merge interval to deal with it, so the performance is the best.

More importantly, the merge interval this operator scenario is not limited to the removal of duplicate values, but more importantly to the removal of repetition intervals.

Take a look at the following example

--Parameterized query conditions declare @Parameter1 DATETIME, @Parameter2 datetimeselect @Parameter1 = ' 1998-01-01 ', @Parameter2 = ' 1998-01-04 ' SELECT OrderID from Orderswhere OrderDate between @Parameter1 and DATEADD (day,6, @Parameter1) OR OrderDate between @Paramet Er2 and DATEADD (day,6, @Parameter2)

Let's take a look at this generated query plan item

As you can see, the query plan that SQL Server generates for us is exactly the same as the one we wrote earlier, and of course our statements have not changed much, and the change is on the query condition.

Let's analyze This query condition:

WHERE OrderDate between @Parameter1 and DATEADD (day,6, @Parameter1)
OR OrderDate between @Parameter2 and DATEADD (day,6, @Parameter2)

Very simple filter criteria to get the value of the order date from 1998-01-01 to 1998-01-07 or 1998-01-04 to 1998-01-10 (not including the start date)

The logical predicate used here is: OR ... In fact, it's the same as what we wrote in front.

But we'll analyze it here, and you'll see that these two time periods overlap.

This repeating interval value, if used in front of the direct index lookup, within this interval of the search out of the range value is duplicated, so in order to avoid this problem, SQL Server introduced the "merge interval" operator.

In fact, after the above analysis, we have analyzed the advantages and disadvantages of this dynamic index lookup, sometimes we want to avoid this complex execution plan generation, the simplest way is to directly pass the value into the statement (of course, it needs to be recompiled), of course, most of the cases we write the program is only defined parameters, Then the operation is performed. The trouble is that the above problem is, of course, sometimes a lot of parameters, in order to merge the application of the order of the use of memory will be increased. How to use, according to the scene of their own discretion analysis.

Second , the index union

The so-called Index union, that is, according to the filter conditions are different, split into different conditions, to match the different index entries.

As an example,

SELECT OrderID from Orderswhere OrderDate between ' 1998-01-01 ' and ' 1998-01-07 ' OR shippeddate between ' 1998-01-01 ' and ' 19 98-01-07 '

This code is the order date of the order is queried from January 1, 1998 to January 7, 1998 or the date of shipment is also from January 1, 1998 to January 7, 1998.

The logic is simple, and we know that in such a table, both fields have index entries. So this query has two choices in SQL Server:

1, a one-time index scan based on the matching result item output, so simple and effective, but if the order table data volume is relatively large, the performance will be very poor, because most of the data is not what we want, but also a waste of time to scan.

2, is directly through the two columns of the index field to find this part of the data, which can directly reduce the number of data table scan, but the problem is that if the separate scan, there is a part of the data is duplicated: those in the same time from January 1, 1998 to January 7, 1998 of the order, the delivery date is also within this period, Since two scan items are included, it is necessary to remove this part of the duplicate data when outputting.

Let's see how SQL Server chooses

It appears that SQL Server has selected the 2nd method in the evaluation. However, the above method is not perfect, the use of de-re-operation costs 64% of the resources.

In fact, the above method, according to the generated query plan can be used to work with the following logic, its effect and the above statement is the same, and the resulting query plan is the same

Select OrderID from Orderswhere OrderDate between ' 1998-01-01 ' and ' 1998-01-07 ' UNION SELECT OrderID from Orderswhere  S Hippeddate between ' 1998-01-01 ' and ' 1998-01-07 '

Let's take another look at an example of index syndication

SELECT OrderID from Orderswhere OrderDate = ' 1998-01-01 ' OR shippeddate = ' 1998-01-01 '

We'll change the between and inequality filter to the equation filter, and we'll take a look at the resulting execution plan.

Basically the same statement, but we changed the different query conditions, but the resulting query plan is still very big change, there are a few differences:

1, the front of the filter with Between...and, through the index lookup returns the value of the combination is used in the way of concatenation, so-called concatenation is two datasets pieced together on the line, no matter what the order of connection.

2, the front of the filter with Between...and, through the concatenation of the result set to the weight of the way, is the sort to go heavy (sort Distinct) ... And it consumes a lot of resources. Flow aggregation is used here to do this, basically do not consume

Let's analyze the following reasons for the difference in two points:

First, this changes the filter condition for the equality connection, and the result items that are created by the index lookup are sorted and sorted according to the OrderID column we want to query, so when the two datasets are summarized, it is appropriate to merge the conditions of the join! Need to sort ahead of time. So the best way here is to use a merge connection!

So before we use the Between...and filter criteria to find the result item obtained by index is also sorted, but here it is not sorted by OrderID, it is sorted by OrderDate or ShippedDate column, and our result is to OrderID column, So the sort here is useless ... So SQL Server can only select one concatenation operation, bring the results together, and then sort the ... I hope I have made it clear here ...

Secondly, about the re-operation, there is no doubt that the use of flow aggregation (Aggregate) This way best, consume less memory, speed and fast ... But the premise is to order in advance ... The previous selection of the sorting to go to the weight (sort Distinct) is purely helpless lift ...

In conclusion: we can make sure that the equation is connected when we write the statement, preferably by an equation. What's more, if you can determine the best output conditions can be written, to avoid redundant bookmark search, and the evil Seleect * ....

If you write the evil of the select * ... Then you can basically write a statement and the non-clustered index to find farewell .... The most is a clustered index scan or RID lookup ...

Look at the following statement

SELECT * from Orderswhere OrderDate = ' 1998-01-01 ' OR shippeddate = ' 1998-01-01 '

Finally, in the operation of a connection predicate of and, this method is called: Index Cross, meaning that if two or more filter criteria are crossed, then a query can be made using one.

Let's see a statement.

SELECT OrderID from Orderswhere OrderDate = ' 1998-01-01 ' and ShippedDate = ' 1998-03-05 '

Here we use the predicate connection method for and, so in the actual execution, although both columns have nonclustered indexes, the theory can be used, but we just choose an optimal index to find, and another directly using bookmarks to find out. Save the previous introduction of the various Gods horse sort to go heavy .... Flow polymerization de-weight .... And so the inhuman operation.

It seems that the and connector is a very handsome operator ... So a lot of times when we try to write or, it's better to switch to and more efficiently.

Reference documents

    • Microsoft Books Online logical operators and physical operator references
    • Refer to the book SQL. server.2005. Technical Insider "series

Conclusion

This article mainly introduces some of the methods of index operation, mainly describes the way we usually apply when writing statements, and to cite a few examples, as a point, in fact, we usually write in the statement is nothing more than this article introduced in the various ways of change, patchwork. And based on this, how do we build the index as a guideline.

Next we introduce a series of sub-query content, interested in advance attention, on the SQL Server performance tuning of the content is very broad, followed by the analysis in the article.

If you have any questions, you can leave a message or private messages, and look forward to an in-depth study of your child's shoes with SQL Server. Learn together and progress together.

At the end of the article to give a few links, it seems necessary to sort out a catalogue .....

SQL Server Tuning Series Basics

SQL Server Tuning Series Basics (Summary of common operators)

SQL Server Tuning Series Basics (Union operator summary)

SQL Server Tuning Series basics (Parallel operations Summary)

SQL Server Tuning Series basics (Parallel operations Summary chapter II)

SQL Server Tuning Series Basics (Index operations Summary)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.