INDEX Summary from the perspective of data storage

Last Update:2017-01-13 Source: Internet

Author: User

Tags create index

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Test table structure:

The code is as follows:	Copy code
Create table TB1 ( Id int identity (1, 1 ), C1 INT, C2 INT )

1. Clustered index)

Clustered index can be understood as an index that contains multiple columns except the index key in a table. To ensure the correctness of the DELETE/UPDATE operation, if the clustered index is not declared as UNIQUE ), the system will add an nullable int type identifier column (UNIQUIFIER) to the clustered index key to ensure the uniqueness of the record.

Unique clustered index:

Create unique clustered index IDX_ID
ON TB1
(
ID
)

Non-unique clustered index:

Create clustered index IDX_ID
ON TB1
(
ID
)

2. Non-unique non-clustered index

To locate data from a non-clustered index, for a heap table, a non-clustered index stores the index key + Data's RID (FILE_ID + PAGE_ID + SLOT_ID). For a clustered table, non-clustered indexes store the index key + clustered index key.

Non-clustered index:

Create index IDX_C1
ON TB1
(
C1
)

Non-clustered indexes on heap tables:

Non-clustered indexes on the unique clustered index table:

Non-clustered index:

-- ====================================

The unique clustered index is used by default in subsequent tests.

-- ====================================

3. Unique non-clustered index

Unique non-clustered indexes are different from non-unique non-clustered indexes mainly on non-leaf nodes. The unique non-clustered index does not contain RID data on non-leaf nodes.

Unique non-clustered index:

Create unique index IDX_C1_UNI
ON TB1
(
C1
)

4. Include indexes

Included indexes are introduced in SQL SERVER 2008. Data containing columns only exists on leaf nodes. Including columns does not affect the location of index rows (not sorted), and including columns does not affect the size of index keys (SQL SERVER limits the index key to no more than 900 bytes)
Copy code

Create index IDX_C1_INC_C2
ON TB1
(
C1
) INCLUDE
(
C2
)

5. Filter indexes

When a filter column is not used as an index key or contains a column, the system does not need to store the data of the filter column in the index. Therefore, the filter column does not appear on the leaf node and non-leaf node of the index.

Create index IDX_C1_WH_C2
ON TB1
(
C1
)
WHERE C2> 1

-- ===================================================== ======================================

Summary and suggestions:

1. For clustered tables, since non-clustered indexes of indexes all contain clustering keys, we recommend that you prioritize static + unique + incremental + small-length index keys as index keys.

A. static: When the clustering key is updated, in addition to moving table data to the corresponding location, updating all non-clustered indexes in turn will consume a lot of resources and cause page splitting and index fragmentation.
B. Unique: the consumption of non-unique clustered indexes increases by 2 to 6 bytes, resulting in more pages consumed by clustered and non-clustered indexes.
C. Increment: for non-incrementing clustered index keys, the insert operation will trigger page splitting and index fragmentation.
D. Small length: clustered INDEX keys with a large length also consume more pages for clustered and non-clustered indexes, especially when the number of INDEX layers increases and the index seek overhead increases.

2. the selectivity of the index column and the direction of the index column: The high selectivity does not mean that the column is suitable for the front of the index. You should also consider whether the column is a range query or equivalent query, for example, the CreatedTime column of the order table is mainly used for range query, and the product number ProductID of the order table is mainly used for equivalent query.

WHERE ProductID = @ P1
AND CreatedTime> @ P2
AND CreatedTime <@ p3

For such a query, INDEX (ProductID, CreatedTime) is more efficient (consuming less CPU and IO resources) than INDEX (CreatedTime, ProductID ).

3. index column order and statistics: the index column order is different. The density of the statistical information (density) and the histogram (histogram) are also different, which indirectly affects the generated execution plan.

4. For columns with low selectivity and located at the back end of the index column, you can consider placing them into the include index column.

5. although the filtering index has some problems in updating statistics, the filtering index is still a must to solve some difficult problems (such as select top (10) * FROM orders WHERE ProductID> 10000 order by OrderID DESC)

6. When creating an index for an incremental column, you should consider the inefficient execution plan caused by the expiration of statistics, such as creating an index for the date column on the order table.

Prepare test data:

The code is as follows:	Copy code
Create table TB1 ( C1 INT, C2 INT, C3 INT ) GO Create unique clustered index IDX_C1 ON TB1 (C1) GO Create unique index IDX_C2 ON TB1 (C2) GO Create index IDX_C3 ON TB1 (C3) GO Insert into TB1 (C1, C2, C3) VALUES (1, 1) GO Insert into TB1 (C1, C2, C3) VALUES (2, 2) GO Insert into TB1 (C1, C2, C3) VALUES (3, 3)

The index number is as follows:

View the PAGE information through dbcc ind and dbcc page.

Unique non-clustered index IDX_C2 data page:

Non-unique non-clustered index IDX_C3 data page:

The obvious difference between the above two images is C1 and C1 (key). Is it in "non-unique non-clustered index, "clustered index key" is also placed in "non-clustered index key" and involved in sorting?

I believe many DBA friends have encountered such a problem. They need to search for data based on certain status values. These status values are a small set (a small number ), for example, you can find the maximum order number with a status value of 1.

The code is as follows:	Copy code
Select top (1 )* FROM dbo. Orders WHERE OrderState = 1 Order by OrderID DESC

Although OrderID is the primary key and a unique clustered index, it may take a wide range of clustered index seek to locate the data that meets the condition OrderState = 1. Therefore, although OrderState is less selective, we will still create an index for it, so the problem arises? What should we build indexes?
Build:

The code is as follows:	Copy code
Create index IDX_OrderState ON dbo. Orders ( OrderState )

Still built:

Create index IDX_OrderState
ON dbo. Orders
(
OrderState,
OrderID
)

I used to think that the second method must be built, because the TOP (1) of OrderID needs to be sorted. However, after testing, the efficiency of the two methods is magically the same, no matter whether the "non-unique non-clustered index key" contains the "clustered index key", the "non-unique non-clustered index key" + "clustered index key" are sorted.

To solve this problem, we assume that we only sort the key defined by the non-unique non-clustered index key, such as OrderState. If OrderState = 0 is met, there may be 0.1 billion data records, during data update, the clustered index is updated first and the non-clustered index is updated in sequence. To update the index data, you must first locate the data row before updating the data, therefore, it is unacceptable to scan the 0.1 billion data to find the target row.

For "unique non-clustered index", because the index key can be used to quickly locate the index data row, and each key value only has one row, therefore, the significance of sorting the clustered index key is lost.

You can also observe the slotid and insert sequence of the same key value to find that the data is sorted by the clustered index key.

-- ===================================================== ==========================================

Summary:

1. For "non-unique non-clustered index", Index data is actually stored after sorting by "non-unique non-clustered index key" + "clustered index key;

2. For "unique non-clustered index", Index data is actually stored after sorting by "unique non-clustered index key;

3. All leaf nodes with non-clustered indexes store RID data, but the only non-clustered index node does not contain RID data;

-- ===================================================== ==========================================

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More