The use of multi-Column Composite indexes bypasses a defect of Microsoft SQL server

Source: Internet
Author: User

However, Microsoft SQL server has an important defect in processing such indexes, that is, the operations that should have been compiled into index seek are compiled into index scanning, which may cause serious performance degradation.

For example, assume that a table T has an index (cityid, sentdate, userid) and now has a paging list function, to query several records greater than a certain multi-Column Composite Index V0, the simplest ideographic method is V> = V0. If it is decomposed, it is:
Cityid> @ cityid0 or (cityid = @ cityid0 and (sentdate> @ sentdate0 or (sentdate = @ sentdate0 and userid >=@ userid0 ))),

When you write the above query, you will expect SQL server to automatically identify the above as a V0 type boundary condition, and use the index seek Operation to implement the query. However, Microsoft SQL server (Version 2005) has an important defect (How can other SQL servers not be known yet). When it encounters such SQL, SQL server uses index scan. The result is that the indexes you have created are not used at all. If the data volume of the table is large, the performance reduction is very large.
I have submitted this question to people from Microsoft who asked me to go to a formal website to submit this defect. I am too lazy to do this.

However, there is still a way to bypass this defect. As long as the above conditions are changed, the SQL server can still change back to index seek instead of Low-Performance index scan. for details, please refer to my original English text. (sorry, once I write Chinese, I don't want to translate it into English. The opposite is true. It is estimated that you can still write it in English, if this is not the case, check the simhei part ,):
The seek predicate of the form "x> bookmark_of_x" is needed in paging related query. the compiler has no difficulty to parse it correctly if x is a single column index, or two columns index, however, if x is a three columns index or more, then the compiler will have a hard time to recognize it. this failure will result in that the seek predicate ended up in residue predicate, which results in a much worse execution plan.
To initialize strate the point, take a example,
Create table A (a int, B int, c int, d float, primary key (a, B, c ))
Now check the plan for the query:
Select c, d from A where (a> 111 or a = 111 and
(B> 222 or B = 222 and c> 333 ))
You can see a table scan op is used, and the Where clause ended up in residue predicate.
However, if you rewrite the query in an equivalent form:
Select c, d from A where a> 111 or a = 111 and B> 222 or a = 111 and B = 222 and c> 333
Then the compiler can choose an index seek op, which is desired.
The problem is, the compiler shocould be able to recognize the first form of seek predicate on multiple columns index, it saves the user from having to pay extra time to figure out a get-around, not to mention the first form is a more efficient form of same expression.
The above problem can be said to be a part of the circle, but there is still a circle, then let's look at the following section:
It looks like that SQL server lacks a consept of vector bookmark, or vector comparison or whatever you like to call it.
The workaround is not a perfect workaround. If SQL server were to understand the concept of vector bookmark, then the following two wocould be the same in execution plan and performance:
1. select top (n) * from A where vectorIndex> = @ vectorIndex
2. select * from A where vectorIndex> = @ vectorIndex and vectorIndex <= @ vectorIndexEnd
-- @ VectorIndexEnd corresponds to the last row of 1.
However, test has shown that, the second statement takes far more time than the first statement, and SQL server actually only seek to the begining of the vector range and scan to the end of the whole Index, instead of stop at the end of the vector range.
Not only SQL server compile badly when the vector bookmark has 3 columns, test has shown that even with as few as 2 columns, SQL serer still can not correctly recognize this is actually a vector range, example:
3. select top (100) a, B, c, d from A where a> 60 or a = 60 and B> 20
4. select a, B, c, d from A where (a> 60 or a = 60 and B> 20) and
(A <60 or a = 60 and B <= 21 ),

The above two queries are essentially the same (The data in the table is exactly the same), and the result set of the same industry is provided. However, 3 is much faster than 4, if you look at the execution plan, it also proves that 3 should indeed be faster than 4.
That is to say, even if the index vectorIndex contains only two columns, SQL server cannot correctly understand the range expression @ vectorIndex0 <vectorIndex <@ vectorIndex1, it can correctly interpret the first half as seek, but the second half cannot be correctly interpreted. As a result, SQL server will always scan the end of the entire table instead of stopping at @ vectorIndex1.
The following test code can be used by anyone who is interested:

Copy codeThe Code is as follows: create table [dbo]. [A] (
[A] [int] not null,
[B] [int] NOT NULL,
[C] [int] not null,
[D] [float] NULL,
Primary key clustered ([a] ASC, [B] ASC, [c] ASC)
)
Declare @ a int, @ B int, @ c int
Set @ a = 1
While @ a <= 100
Begin
Set @ B = 1
Begin tran
While @ B <= 100
Begin
Set @ c = 1
While @ c <= 100
Begin
Insert into a (a, B, c, d)
VALUES (@ a, @ B, @ c, @ a + @ B + @ c)
Set @ c = @ c + 1
End
Set @ B = @ B + 1
End
Commit
Set @ a = @ a + 1
End
SET STATISTICS PROFILE ON
Set statistics time ON
Set statistics io ON

Select top (10) a, B, c, d from A where (a> 60 or a = 60 and
(B> 20 or B = 20 and c> = 31 ))
Select a, B, c, d from A where (a> 60 or a = 60 and
(B> 20 or B = 20 and c> = 31) and (a <60 or a = 60 and
(B <20 or B = 20 and c <= 40 ))

Select top (10) a, B, c, d from A where a> 60 or a = 60 and B> 20 or a = 60 and B = 20 and c> = 31
Select a, B, c, d from A where (a> 60 or a = 60 and B> 20 or a = 60 and B = 20 and c> = 31) and
(A <60 or a = 60 and B <20 or a = 60 and B = 20 and c <= 40)
Select top (100) a, B, c, d from A where a> 60 or a = 60 and B> 20
Select a, B, c, d from A where (a> 60 or a = 60 and B> 20) and (a <60 or a = 60 and B <= 21)
Select top (100) a, B, c, d from A where a> 60 or a = 60 and B> 20
Select a, B, c, d from A where (a> 60 or a = 60 and B> 20) and (a <60 or a = 60 and B <= 21)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.