Source: When SQL Server creates a composite index, the composite index column order affects the performance of the query
Talk about composite indexes
Write index blog Too much, has not wanted to write, there are two reasons:
One is the suspicion that there are fried leftovers, brothers have said: Index, as long as the query criteria to build the index on the line, really can be so violent?
Second, the index is a very big topic, it is difficult to summarize all the situation, you do not make a point of novelty, but there is a copy of the suspicion of copying
Since it's written, write something a little bit different,
All right, stop the crap.
Build a test environment:
Create a table, simulate a table in the actual business, fill in the data,
Time field, which is filled evenly by time, and other fields are populated with GUIDs
Create Tablebusinessinfotable (BuniessCode1varchar( -), BuniessCode2varchar( -), BuniessCode3varchar( -), BuniessCode4varchar( -), BuniessStatus1tinyint, BuniessStatus2tinyint, BuniessDateTime1Datetime, BuniessDateTime2Datetime, OtherColumn1varchar( -), OtherColumn2varchar( -), OtherColumn3varchar( -))Declare @i int=0 while @i<1000000begin Insert intobusinessinfotableValues ( NEWID(),NEWID(),NEWID(),NEWID(),RAND()* -,RAND()* -, DATEADD(MI,@i,GETDATE()),DATEADD(MI,@i,GETDATE()),NEWID(),NEWID(),NEWID() ) Set @i=@i+1End
Now there is a query (actually the query is far more complicated than this, I simplify a little, do not say that I deliberately built the environment)
SelectOtherColumn2, BuniessStatus1, BuniessStatus2, BuniessDateTime1, Buniessdatet Ime2 frombusinessinfotablewhereBuniessDateTime1between '2016-6-21' and '2016-6-28' andBuniessDateTime2between '2016-6-21' and '2016-6-28' andBuniessStatus1= - andBuniessStatus2= the
A solemn statement:
The clustered index is not considered for the time being, after all, there can be only one clustered index on a table.
No one else is a fool, not easy to build a clustered index, and the clustered index is already occupied
Since it is occupied, my principle is generally not to move other people's existing things, such as others built a clustered index, you have deleted, according to their own situation to build a clustered index
Isn't that a curse?
Experienced you must consider qualifying the index, taking into account the search for bookmarks to avoid key lookup, we have asked for the OtherColumn2 column include in the
Like this.
CREATE nonclustered INDEX on businessinfotable (buniessstatus1,buniessstatus2,buniessdatetime1,buniessdatetime2) INCLUDE (OTHERCOLUMN2)
Or so, just the index column order is different
CREATE nonclustered INDEX on businessinfotable (BUNIESSDATETIME1,BUNIESSDATETIME2,BUNIESSSTATUS1,BUNIESSSTATUS2) INCLUDE (OTHERCOLUMN2)
Of course you can adjust the order of four columns, I do not too much to do the demonstration, interested in their own try
The order of the leading columns here does not affect the use of the index, when the query is non-clustered index seek, absolute
So the question comes, exactly the same query condition, the same as the result, using different indexes, the difference between the index is only the column order is not the same, the cost of the same, first guess, there is a difference?
The same query, using the results of different indexes (Idx_1 and Idx_2, respectively):
Look at the picture below.
Look at the IO situation
Cause analysis
Seems to be a little different, good seems this difference is really not small (in the past, I test the environment is not good, compared to the effect is not obvious, the feeling is not convincing, this contrast is more obvious)
What is the reason?
The index is present in the form of a balanced tree (b-tree), and the order of the columns of the composite index determines the order in which the information of the B-tree is stored
If BuniessStatus1 is listed as the leading column, because the range of the BUNIESSSTATUS1 distribution (relative) is smaller,
This can filter out a relatively small result set by buniessstatus1=55 at the time of query, then filter with other conditions in order is relatively fast.
For example, buniessstatus1=55 filtered out the qualifying data with 5
Plus BuniessStatus2 BuniessDateTime1 BuniessDateTime2 These three conditions again filtered out a piece of data.
If BUNIESSDATETIME1 is the leading column of the index, filter with BuniessDateTime1 between ' 2016-6-21 ' and ' 2016-6-28 '
There may be 10,000 data, then BUNIESSDATETIME2,BUNIESSSTATUS1, BUNIESSSTATUS2 filtering
Finally, there is only one qualifying piece of data.
The difference is that: at the beginning of the filter conditions, determine how many page queries to determine the initial criteria to meet the data, and then further filtering
If the data range that satisfies the query condition is determined relatively accurately at the very beginning, the data that satisfies the condition can be finally confirmed by a relatively small price.
If the data range that satisfies the query condition is determined relatively vaguely at first, the cost of the process is relatively large, although the result is the same as the one behind each condition.
Of course, the establishment of this index is related to data distribution,
However, I did not conclude that the composite index must be in what order to be the best
Or that sentence: concrete analysis of specific problems, to avoid empiricism, there is no one-cut means to solve all the problems.
Summarize:
This article, through a simple example, analyzes the effect of the order of the columns on the query when the index is created, stating that when creating the index, you should not only consider which columns to create the index on, but also whether the order of indexed columns will affect the query. Avoid talking about indexing, which is the "Index on query criteria" approach to violence.
When SQL Server creates a composite index, the composite index column order affects the performance of the query