How to select the sequence of Data columns in the Resolution Index
When we create an index on multiple columns, we often encounter the following problem: "Which column needs to be put in front", because the column order in the index is different, it will have a great impact on the use of indexes and performance. This article analyzes this problem.
A common answer to the above question is "put the largest selective column in front". Here, we will explain the meaning of the selectivity to make the subsequent presentation order. Selectivity is used to describe the differences between data. For example, if a table contains 1000 pieces of data, one of the fields, such as ID, and if the ID values of each piece of data are different, then, the ID selectivity is 1. If 300 of the IDS are the same, that is, if 700 IDs are different, the selectivity is 70%. Obviously, the higher the data selectivity, the better the index creation effect.
Next, let's explain why we need to put the columns with high selectivity at the beginning when creating an index on multiple columns.
After hearing the above suggestions, some friends may regard the column where the clustered index of the table is located as the first field of the Multi-column index when creating any index based on multiple columns. For example, assume that the table has four fields: ID, name, age, and birthdate. ID is the primary key and also a clustered index. Now we need to create an index on name and birthdate, at this time, some friends found that: ID is the most selective, so it is better to put ID in the new index, so an index named ix_index contains three columns: ID, name, birthdate. Later, we may find that, if we did this, the newly created index would not work, but would lead to performance problems.
Each index in the database has the corresponding statistical data. This statistical data shows the data distribution, and the statistical information shows the data distribution in a column-like form. The database only places the data distribution of the first column in the index in the column chart. In other words, the statistics show the data distribution of the first data column in the index (the content involved here is a bit deep. You can pay attention to the "query optimizer kernel series" on this site ", ).
Let me show you an example. Suppose there is an index on the salesorderdetail table: x_salesorderdetail_productid. Run the following statement:
This index contains the following columns: productid, salesorderid, and salesorderdetailid.We can view the column distribution chart of its data as follows:
We found that range_hi_key lists the productid values. From the figure, we can know that there are 826 pieces of data with a productid value of 305 and 831 pieces of data with a value of 198. The value of productid ranges from 826 to 831. The query optimizer estimates the number of data entries based on this.
We can know from the above: it is very important to put the column in the index first. If a column with low selectivity is put in front, the data distribution displayed in the index will change completely, the Query Optimizer may choose an inefficient execution plan.
Next, let's take a further look at this problem through an example.
First, create a test table as follows:
This table contains 10000 data records and is a heap table, that is, a table without clustered indexes. In this table, there are 100 different somestring values and 5000 different somedate values, and IDs are unique, all of which are different.
The preceding values are optional as follows:
Field name |
selectivity |
id |
100% |
somestring |
100/10000 * 100% = 1% |
somedate |
5000/10000 * 100% = 50% |
In the table, there is a non-clustered index. Assume that the name is idx_test and contains three values in the table. The order of the three columns in the index is ID, somedate, somestring, sorting by selectivity is really good!
For the above index, only play a role in a query structure similar to the following:
... Where id = @ ID and somedate = @ DT and somestring = @ Str
... Where id = @ ID and somedate = @ dt
... Where id = @ ID
In other words, this index is only valid when the where/join column in the query is used in the order of the column in the index. If the query is as follows:
... Where somedate = @ DT or... Somedate = @ DT and somestring = @ Str
Then, this index will not be used in the above query, so the query will scan the entire table during execution.
We can use the execution plan to see if this is the case.
For queries with where id = @ ID, the execution plan is as follows:
Apparently, the seek operation was executed very quickly.
For queries with where id = @ ID and somedate = @ DT, the execution plan is as follows:
Or the seek operation is performed.
So... The query for somedate = @ DT and somestring = @ STR is as follows:
We can see that full table scanning has started at this time.
This article describes the sequence of columns when the index performs equal operations on columns. The next article describes how to perform unequal operations on columns, such as ID> 1, is the column order in the index still in this way?
SeriesArticleLink:
Detailed explanation of IIS Server Load balancer-application request route Article 1: arr Introduction
Detailed explanation of IIS Server Load balancer-application request route Article 2: Create and configure Server Farm
IIS Server Load balancer-application request route explanation Article 3: Server Load balancer for HTTP requests using Arr (I)
IIS Server Load balancer-application request Route details article 3: Server Load balancer for HTTP requests using Arr (below)
IIS Server Load balancer-application request route explanation article 4: layer-3 deployment architecture using arr
Server Load balancer Principles and Practices: Part 1 (reorganizing)
Server Load balancer Principles and Practices)
Server Load balancer principles and practices part 3 basic concepts of Server Load balancer-network Basics
Server Load balancer principles and practices Chapter 4 server load bal
Principles and Practices of Server Load balancer Article 5 Detailed description of data packet process during Server Load balancer
Server Load balancer principles and practices Chapter 6 Health Check Mechanism (part I)
Server Load balancer principles and practices Chapter 7 Health Check Mechanism (II)
Server Load balancer principles and practices Chapter 8 Network Address Translation (part I)
Server Load balancer principles and practices Chapter 8 Network Address Translation (Part II)
Server Load balancer principles and practices Article 9 Server Load balancer advanced technology-session persistence (I)
Server Load balancer principles and practices Article 10 Server Load balancer advanced technology-session persistence (medium)
Server Load balancer principles and practices 11th Server Load balancer advanced technology-session persistence (below): delayed binding