The use of multiple-column composite indexes bypasses a flaw in Microsoft SQL Server

The use of multiple-column composite indexes bypasses a flaw in Microsoft SQL Server _mssql

Last Update:2017-01-19 Source: Internet

Author: User

Tags microsoft sql server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

However, Microsoft SQL Server has an important flaw in processing such indexes, which is to index the operations that should be compiled into the index seek, which can cause severe performance degradation

Let me give you an example to illustrate the problem. Assuming that a table T has an index (Cityid, Sentdate, UserID), and now has a paging list feature, to obtain a query that is larger than a number of records in a multiple-column composite index V0, the simplest way to write it is the V >= V0, which, if decomposed, It is:
Cityid > @cityid0 or (Cityid = @cityid0 and (Sentdate > @sentdate0 or (sentdate = @sentdate0 and userid >= @useri D0)),

When you write the above query, you expect SQL Server to automatically recognize this as a V->= V0 type boundary condition and use the index seek operation to implement the query. However, Microsoft's SQL Server (version 2005) has an important flaw (other SQL Server how does not know), when it encounters such SQL, SQL Server will be implemented with the index scan, the result is that your established index is not used at all, If this table has a large amount of data, the resulting performance degradation is very large.
For this issue, I have submitted to Microsoft's relevant people, they further asked me to go to a formal website to submit this flaw, I am lazy to do.

However, there is a way around this flaw, as long as the conditions given above are distorted, SQL Server can be changed back to the index seek, rather than the low performance of the index scan. Please see my original English (I'm sorry, once I wrote Chinese, I do not want to turn into English, the reverse is the same, it is estimated that everyone can be English, it is not good to see the bold part of it,):
The SEEK predicate of the form "x > bookmark_of_x" are needed in paging related query. The compiler has no difficulty to parse it correctly if x are a single column index, or two columns index, however, if X is A three columns index or more, then the compiler would have a hard time to recognize it. This failure would result into that the seek predicate ended up in residue predicate, which results in a much worse execution Plan.
To illustrate the point, take a example,
Create Table A (a int, b int, c int, d float, primary key (A, B, c))
Now check the * for the query:
Select C, D from A where (a> or a= and
(b > 222 or b = 222 and C > 333))
You can have a Table scan op is used, and the Where clause ended up in residue predicate.
However, if you are rewrite the query in a equivalent form:
Select C, D from A where a> or a= and B > 222 or a= b= and 222
Then The compiler can choose a index seek OP, which is desired.
The problem is, the compiler should being able to recognize the "the" I-seek predicate on multiple columns index, it SA The VES the user from has to pay extra time to figure out a get-around and not to mention the "I" a more efficient F ORM of same expression.
The above problem, can be said to be part of the past, but, there are also around the time, and then look at the following paragraph:
It looks like this SQL Server lacks a consept of vector bookmark, or vector comparison or whatever you.
The workaround is not a perfect workaround. If SQL Server were to understand the concept of vector bookmark, then the following two the would is the same in execution pla N and Performance:
1. Select TOP (N) * from A where Vectorindex >= @vectorIndex
2. Select * from A where Vectorindex >= @vectorIndex and Vectorindex <= @vectorIndexEnd
--@vectorIndexEnd corresponds to the last row of 1.
However, test has shown, the second statement takes far more time than the "a", and SQL Server statement O Nly the begining of the vector range and scan to the "end of" the whole Index, instead of stop at the end of the VEC Tor Range.
Not only SQL Server compile badly when the vector bookmark has 3 columns, test has shown this even with as few as 2 column s, SQL Serer still can not correctly recognize this is actually a vector range, example:
3. Select Top (MB) A, B, C, D from a where a> or a= and B > 20
4. Select a, B, C, D from where (a> or a= and B >) and
(a< or a= and B <= 21),

The above two queries are essentially the same (the data in the table is exactly the same), and give the same result set, but 3:4 is much faster, and if you go to see Execution plan, it also proves that 3 really should be faster than 4.
That is, even if the index Vectorindex contains only two columns, SQL Server cannot correctly understand the range expression @vectorIndex0 < Vectorindex < @vectorIndex1, It can interpret the first half correctly as seek, but the latter part is not properly interpreted, resulting in SQL Server scanning all the way to the end of the table, rather than stopping at @vectorindex1.
The following test code, interested people can take to play for themselves:

Copy Code code as follows:

CREATE TABLE [dbo]. A (
[a] [int] not NULL,
[b] [int] not NULL,
[c] [int] not NULL,
[d] [float] NULL,
PRIMARY KEY CLUSTERED ([a] ASC, [b] ASC, [c] ASC)
)
declare @a int, @b int, @c int
Set @a =1
While @a <= 100
Begin
Set @b = 1
BEGIN Tran
While @b <= 100
Begin
Set @c = 1
While @c <= 100
Begin
INSERT into a (a, B, C, D)
VALUES (@a,@b,@c,@a+@b+@c)
Set @c = @c + 1
End
Set @b = @b + 1
End
Commit
Set @a = @a + 1
End
SET STATISTICS profile on
SET STATISTICS time on
SET STATISTICS io on

Select Top A, B, C, D from a where (a> or a=)
(b > or b = 31 and C >=))
Select a, B, C, D from a where (a> or a=)
(b > or b = 31 and C >=)) and (a< or a=
(b < or B = 40 and C <=))

Select Top A, B, C, D from a where a> or a= and B > or a= and b= and C >= 31
Select a, B, C, D from where (a> or a= and B > or a= and b= and C >=) and
(a< or a= and B < a= and b= and C <= 40)
Select Top A, B, C, D from a where a> or a= and B > 20
Select a, B, C, D from a where (a> or a= and B >) and (a< or a= and B <= 21)
Select Top A, B, C, D from a where a> or a= and B > 20
Select a, B, C, D from a where (a> or a= and B >) and (a< or a= and B <= 21)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More