SQL Server-Clustered index < sixth >

Source: Internet
Author: User

The leaf page of a clustered index stores the table's data. As a result, table rows are physically sorted by clustered index columns, because table data can only have one physical order, so a table can have only a single clustered index.

When we create a PRIMARY key constraint, if there is no clustered index and the index is not explicitly specified as a nonclustered index, SQL Server automatically creates it as a unique clustered index, not that the primary key column must be a clustered index, which is just the default behavior.

example, when a table is built by specifying a primary key for a nonclustered index so that the primary key column is not a clustered column:

nonclustered, Column2 int}
I. Heap tables and aggregation tables

Tables that do not have a clustered index are called heap tables. The data columns of the heap table are not in any particular order, and are connected to adjacent pages of the table. This unstructured structure typically increases the overhead of accessing large heap tables compared to accessing large clustered tables.

A table with a clustered index is called a clustered table, and a clustered table is a B-tree structure that can significantly reduce the number of reads when the data volume is large.

Ii. relationship with nonclustered indexes

There is an interesting relationship between clustered and nonclustered indexes in SQL Server, and an indexed column of a nonclustered index contains pointers to the corresponding data rows of the table. This pointer is called the row locator. The value of the row locator depends on whether the data table is a heap table or a clustered table. When the table is stacked, the row locator is the RID pointer to the data row in the heap. For a table with a clustered index, the row locator is a clustered index key value.

Here's a table to illustrate the relationship.

Suppose you have a 2-column table:

RID (This is not the actual column) Column 1 Column 2
1 A1 A2
2 B1 B2

Heap table:

Indexed columns (column 1) Row Locator
A1 RID = 1 points to the first row of data in the table
B1 RID = 2 points to the second row of data in the table

Clustered table, let's say we set column 2 as a clustered index column:

Indexed columns (column 1) Row Locator
A1 A2 pointing to the aggregation key
B1 B2 pointing to the aggregation key

Thus, finding a row of data through a nonclustered index column requires more than one step-getting the actual data through the RID. This rid is a row pointer in the heap table, and the clustered table is a clustered key value.

III. Recommendations for clustered indexes

  1. Create a clustered index first

For clustered tables, the order in which nonclustered indexes and clustered indexes are created is important because all nonclustered indexes hold a clustered index key value on their index rows. If a nonclustered index is created before the clustered index, the row locator for the nonclustered index contains a pointer to the rid of the heap table. When you then create the clustered index, the RID pointer of all nonclustered indexes is changed to the clustered key, which is actually the equivalent of re-establishing the nonclustered index.
For best performance, it is a good idea to create a clustered index before any nonclustered indexes are created. This causes the nonclustered indexes to set their row locators directly to the clustered index value when they are created. This has little impact on the final performance, but SQL Server has much less work and is much faster. This is especially useful if you are working on a system that is running on-line for maintenance.

  2. Keep the narrow index

Because all nonclustered indexes have clustered index keys as row locators, for best performance, the overall length of the clustered index should be as small as possible.

Imagine if you create a wide clustered index, such as Char, which adds a 500-byte value to each nonclustered index. Even if the nonclustered index has nothing to do with it, the space occupied by the index key value of the light gathers, its page of data page can only hold about 16 data rows.

Maintaining a narrow clustered index can effectively reduce logical read operations and disk I/O.

  3. Rebuild the clustered index in one step

Because of the dependency of nonclustered indexes on the clustered index, rebuilding a clustered index with a separate drop index and CREATE index statement causes all nonclustered indexes to be rebuilt two times (DROP, the row locator points to the heap table data row pointer, and the CREATE row locator points to the new clustered key value). To avoid this situation, use the DROP_EXISTING clause of the CREATE INDEX statement to rebuild the clustered index in a separate atomic step. Similarly, you can also use the DROP_EXISTING clause on a nonclustered index.

  4. When to use a clustered index

In some cases, it is helpful to use a clustered index.

  1, retrieve a certain range of data

Because the leaf page of the clustered index is the actual data of the table, the order of the clustered index columns is the physical order of the data rows in the table. If the physical order of the data rows is the same as the data order of the query request, the disk prick can sequentially read all rows without requiring too much head movement.

Assuming my clustered index is built on the ID column, I need to read the data for ID between 1 and 100 or ID > 100, then all rows of data are arranged on disk. This allows the head to move to the position of the first row on the disk, and then to read all the data with the fewest heads moving order. On the other hand, if the rows are not arranged in the correct physical order on the disk, the head must be randomly moved from one position to another to read all the related rows. The physical movement of the head is the most important part of the disk operation overhead, which optimizes the I/O overhead by sorting the rows on disk in the appropriate physical order (using a clustered index).

  2. Read pre-sorted data

A clustered index is particularly effective when data reads need to be sorted, and if a clustered index is created on one or more columns that might need to be sorted, the row is physically sorted in that order, eliminating the overhead of sorting after the data is read.

To retrieve the data sorted by a range without a clustered index:

  

In the case of a clustered index, retrieve the data sorted by the range:

  

As you can see, the range ordering with clustered indexes returns data very quickly, because the clustered columns themselves are already in the sorted order in the database.

  5. When not to use clustered indexes

In some cases, it is best not to use a clustered index.

  1. Frequently updated columns

If the clustered index column is updated frequently, the row locators for all nonclustered index rows are updated accordingly, thereby significantly increasing the cost of the associated operation query. It also blocks other queries that reference the same part and nonclustered indexes during this time, which can affect the parallelism of the database. Therefore, you should avoid creating clustered indexes on a large number of updated columns.

  2, the wide key word

Because all nonclustered indexes have clustered keys as their row locators, you should avoid creating clustered indexes on very wide or too many columns for performance. The above Red bold font specifically explains why.

  3. Too many parallel sequential insertions

If you want to add many new rows concurrently, it is better for performance to distribute them across the table's data pages. However, if all rows are added in the same order as the clustered index, all inserts are made on the last page of the table. This may cause a huge "hotspot" in the corresponding mountain area of the disk, in order to avoid disk hotspots, the data rows should not be arranged in the same order as the physical location. The insert operation can be randomly distributed to the entire table by creating an index on another column that does not arrange the rows in the same order as the new navigation. This problem occurs only when a large number of simultaneous insertions occur.

Allows insertion at the end of a table to avoid page splits that need to accommodate new rows. If the data is dropped in parallel, then arranging the rows in the order of the new rows (using a clustered index) avoids page splits. However, if a disk hotspot becomes a performance bottleneck, the new row can be accommodated to the intermediate page by reducing the fill factor for the table. In addition, the "hot" page will be in memory, which also facilitates performance.

Finally, attach a method that sets the non-primary key as a clustered index column:

1. View all indexes, the clustered index is established by default on the primary key

View index:
Sp_helpindex person
To view constraints:
Sp_helpconstraint person

2.--Delete the primary KEY constraint and remove the index constraint "such as: pk__person__117f9d94" from the primary key queried in "1". Remove the primary KEY constraint above the primary key field, when the field is not a primary key.
ALTER TABLE person DROP constraint Pk_person

3.--creating a clustered index to another column

Create clustered index Test_index on person (Name)

4.-to modify the original primary key field or the primary key, the nonclustered index is automatically established "because a clustered index already appears"

Sp_helpindex personsp_helpconstraint personalter table person drop constraint pk_personcreate clustered index Test_index On person (Name) ALTER TABLE person add primary key (ID)

ALTER TABLE person add primary key (ID)

  

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.