International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > SQL Server

On the selection of SQL Server clustered index keys from a performance perspective

Last Update:2017-02-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Brief introduction

In SQL Server, data is stored on a page. When you add a clustered index to a table, SQL Server searches for the data by using the columns of the clustered index as the keyword. Therefore, the impact on performance of the selection of clustered indexes becomes very important. This paper focuses on the selection of clustered indexes from the perspective of performance, but this is only considered in terms of performance. For tables with special business requirements, you need to choose the actual situation.

The combination of the column or column that contains the clustered index is best unique

This reason needs to be discussed from the data storage principle. In SQL Server, data is stored not in rows (row), but in page units. Therefore, when looking for data, the smallest unit that SQL Server looks for is actually a page. That is, even if you look for only a small row of data, SQL Server will find the entire page and put it in the buffer pool.

The size of each page is 8K. Each page will have a physical address for SQL Server. This address is written as a file number: Page number (Understanding the file number requires you to understand the file and file group). For example, page 50th of the first file. The page number is 1:50. When a table does not have a clustered index, the data pages in the table are stored in a heap (Heap), and on the basis of the page, SQL Server uniquely identifies each row by an extra line number, which is the legendary RID. The RID is a file number: Page number: line number to represent, assuming that this line is on line 5th of the page mentioned earlier, the RID is represented as 1:50:5, as shown in Figure 1.

Figure 1. Example of a RID

From the concept of the RID, the RID is not just a basis for SQL Server to determine each row, but also a place to store rows. Pages are rarely moved when the page is organized through a heap (Heap).

When a clustered index is established on a table, the pages in the table are organized according to B-Tree. At this point, SQL Server looks for rows that are no longer being looked up by the RID, instead using the keyword, which is the column of the clustered index to look up as a keyword. Assuming the table in Figure 1, we set the DepartmentID column as the clustered index column. The row of the non-leaf node of the B-tree contains only the DepartmentID and the bookmark (bookmark) that points to the next level of nodes.

When we create a clustered index with a value that is not unique, SQL Server cannot determine only one row through the clustered indexed column (that is, the keyword). In this case, in order to achieve a unique distinction for each row, SQL Server is required to make a distinction between the aggregated indexed columns of the same value, which is called uniquifiers. With the use of uniquifier, the impact on performance is divided into the following two parts:

SQL Server must judge the current data at the time of insertion or update whether it duplicates the existing key, and if it repeats, it needs to generate Uniquifier, which is an extra overhead.

Because of the need to add extra uniquifier to the keys of the same value, the size of the key is increased by an extra. Therefore, both leaf nodes and non leaf nodes require more pages for storage. This also affects the nonclustered index, which makes the bookmark columns of the nonclustered index larger so that the nonclustered indexes need more pages to store.

Here we test, create a test table, and create a clustered index. Insert 100,000 test data, each of which repeats in 2, as shown in Figure 2.

See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/database/SQLServer/

Figure 2. Test code for inserting data

At this point, we'll look at the number of pages in this table, as shown in Figure 3.

Figure 3. After inserting the duplicate key, 100,000 data occupies 359 pages.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

SQL Server R2 download address and serial number, available t... 05-11

Python reads the configuration file and connects to the datab... 01-13

SQL Server 2005 32-bit + 64-bit, Enterprise Edition + Standar... 12-24

pymssql calling SQL Server stored procedure with output param... 09-07

Where is SQL Server Configuration Manager in Windows10? 11-23

Nolock, HOLDLOCK, UPDLOCK, TABLOCK, Tablockx in SQL Server 08-28

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

On the selection of SQL Server clustered index keys from a performance perspective

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support