This article is part of the "Stairway series: Steps for SQL Server Indexing"
Indexes are the basis of database design and tell developers to use the database with regard to the designer's intentions. Unfortunately, when performance problems arise, indexes are often added as an afterthought. The end here is a simple series of articles that should enable them to quickly make any database professional "fast"
The preceding levels of this phase provide an overview of general and nonclustered indexes. It ends with the following key concepts about SQL Server indexing. When a request arrives at your database, whether it is a SELECT statement or a insert,update or DELETE statement, SQL Server has only three possible ways to access the data of the tables referenced in the statement:
Access only nonclustered indexes and avoid accessing tables. This can only be possible if the index contains all the data for the table that the query requests
? Use the search key to access the index, and then use the selected bookmark to access a single row in the table.
Ignore the index and search the table for the requested row.
The focus of this level is the third option in the list above. Search the table. This in turn leads us to discuss clustered indexes. Topics mentioned at the second level but not covered.
The Primary AdventureWorks database table We will use at this level is the SalesOrderDetail table. In 121,317 rows, it is sufficient to illustrate some of the benefits of having a clustered index on a table. Furthermore, there are two foreign keys, which are sufficient to illustrate some design decisions about clustered indexes.
Sample Database
Although we have already discussed the first level of the sample database, but this time still want to repeat. Throughout this phase, we will use examples to illustrate concepts. These examples are based on the Microsoft AdventureWorks sample database. We specialize in sales orders. Five sheets will give us a good mix of trading and non-transactional data; Customer,salesperson,product,salesorderheader and SalesOrderDetail. To keep the focus, we use a subset of the columns. Since AdventureWorks normalization is good, salesperson information is broken down into three tables: Salesperson,employee and contact.
Throughout the ladder, we use the following two terms to exchange a single line on an order: "Order Items" and "Order Details". The former is a more common business term; The latter appears within the name of the AdventureWorks table.
Figure 1 shows a complete set of tables and their relationships.
Figure 1: The table used in this stairway example
Attention:
All TSQL code displayed at this stair level can be downloaded with the article.
Clustered index
We first ask the following question: How much work does it take to find a row in a table without using a nonclustered index? Does searching for a requested row in a table mean scanning every row in an unordered table? Or, SQL Server can permanently sort the rows in a table to quickly access them through search keywords, just as you would quickly access an entry for a nonclustered index by searching for keywords? The answer depends on whether you instruct SQL Server to create a clustered index on the table.
Unlike nonclustered indexes, which are separate objects and occupy their own space, clustered indexes are the same as tables. By creating a clustered index, you can instruct SQL Server to sort the rows in the table into an index key sequence and maintain the sequence during future data modifications. The upcoming level will look at the generated internal data structures to complete this operation. But now, think of the clustered index as an ordered table. Given the index key value of a row, SQL Server can quickly access the row, and it can be done sequentially from that row.
For demonstration purposes, we created two copies of the sample table SalesOrderDetail, one with no indexes and one with a clustered index. With regard to the key fields of the index, we made the same choice as the designer of the AdventureWorks database: Salesorderid/salesorderdetailid. The code in Listing 1 creates a copy of the SalesOrderDetail table. We can rerun this code at any time, and we want to start with a "clean slate".
IF EXISTS (SELECT * from sys.tables& #160;
WHERE object_id = object_id (' dbo. Salesorderdetail_index '))
DROP TABLE dbo. Salesorderdetail_index;
GO
IF EXISTS (SELECT * from sys.tables& #160;
WHERE object_id = object_id (' dbo. Salesorderdetail_noindex '))
DROP TABLE dbo. Salesorderdetail_noindex;
GO
SELECT * into dbo. Salesorderdetail_index from Sales.SalesOrderDetail;
SELECT * into dbo. Salesorderdetail_noindex from Sales.SalesOrderDetail;
GO
CREATE CLUSTERED INDEX Ix_salesorderdetail
ON dbo. Salesorderdetail_index (SalesOrderID, Salesorderdetailid)
GO
Listing 1: Creating a copy of the SalesOrderDetail table
Therefore, before you create a clustered index, assume that the SalesOrderDetail table looks like this:
SalesOrderID salesorderdetailid ProductID OrderQty UnitPrice
69389 102201 864) 3 38.10
56658 59519 711) 1 34.99
59044 70000 956) 2 1430.442
48299 22652 853) 4 44.994
50218 31427 854) 8 44.994
53713 50716 711) 1 34.99
50299 32777 739) 1 744.2727
45321 6303 775) 6 2024.994
72644 115325 873) 1 2.29
48306 22705 824) 4 141.615
69134 101554 876) 1 120.00
48361 23556 760) 3 469.794
53605 50098 888) 1 602.346
48317 22901 722) 1 183.9382
66430 93291 872) 1 8.99
65281 90265 889) 2 602.346
52248 43812 871) 1 9.99
47978 20189 794) 2 1308.9375
After you create the clustered index as shown above, the resulting table/clustered index will look like this:
SalesOrderID salesorderdetailid ProductID OrderQty UnitPrice
43668 106 722) 3 178.58
43668 107 708) 1 20.19
43668 108 733) 3 356.90
43668 109 763) 3 419.46
43669 110 747) 1 714.70
43670 111 710) 1 5.70
43670 112 709) 2 5.70
43670 113 773 2 2,039.99
43670 114 776 1 2,024.99
43671 115 753 1 2,146.96
43671 116 714) 2 28.84
43671 117 756) 1 874.79
43671 118 768) 2 419.46
43671 119 732) 2 356.90
43671 120 763) 2 419.46
43671 121 755) 2 874.79
43671 122 764) 2 419.46
43671 123 716) 1 28.84
43671 124 711) 1 20.19
43671 125 708) 1 20.19
43672 126 709) 6 5.70
43672 127 776 2 2,024.99
43672 128 774 1 2,039.99
43673 129 754) 1 874.79
43673 130 715) 3 28.84
43673 131 729) 1 183.94
When you look at the sample data shown above, you may notice that each salesorderdetailid value is unique. Do not confuse; Salesorderdetailid is not a primary key for a table. The combination of Salesorderid/salesorderdetailid is the primary key of the table, and the index key of the clustered index.
Understanding the fundamentals of clustered indexes
The clustered index key can be made up of any column you choose; it does not have to be based on a primary key. In our case, the most important thing is that the leftmost column is a foreign key, the SalesOrderID value. Therefore, all line items for the sales order appear consecutively in the SalesOrderDetail table.
Keep in mind the following additional points about SQL Server clustered indexes:
? Because the clustered index entry is a row of the table, there is no bookmark value in the clustered index entry. When SQL Server is already on a single line, it does not need a message to tell it where to find the line.
The clustered index always overwrites the query. Because the indexes and tables are the same, each column of the table is in the index.
? Creating a clustered index on a table does not affect the option to create a nonclustered index on the table.
Select Clustered index key column
Each table can have a maximum of one clustered index. The rows of a table can be only one sequence. You need to decide what order, if any, is best for each table and, where possible, create a clustered index before the table populates the data. When making this decision, remember that sorting means not only sorting, but also grouping, such as grouping order items by sales order.
This is why the designer of the AdventureWorks database chooses SalesOrderID within the salesorderdetailid as the reason for the order of the SalesOrderDetail tables; This is the natural order of the Order items.
For example, if a user requests an order item for an order, then all order items for that order are typically requested. A typical sales order form tells us that the printed version of the order always contains all the line items. The nature of a sales order business is to group line items by sales order. The warehouse occasionally requires that the order item be viewed by product rather than by the sales order, but most of the requirements, such as a salesperson or customer, a program that prints the invoice, or a query that calculates the total value of each order, will require all line items for all sales orders.
However, the user requirements themselves do not determine what is the best clustered index. The future level of this series will cover the interior of the indicator, because some internal aspects of the index will also affect your choice of clustered index columns.
Heap
If there is no clustered index in the table, the table is called a heap. Each table is a heap or a clustered index. So, although we often say that each indicator belongs to one of the two types of clustering or non-clustering, it is equally important to note that each table belongs to one of two types; it is a clustered index or it is a bunch. Developers often say that a table "has" or "does not" have a clustered index, but it is more meaningful to say that the table is "yes" or "not" a clustered index.
There is only one way for SQL Server to search the heap when it finds rows (excluding the use of nonclustered indexes), starting with the first row of the table, and then continuing to execute the table until all rows are read. There are no sequences, no search keys, and no way to quickly navigate to a specific row.
Compare clustered indexes and heaps
To assess the performance of clustered indexes and heaps, listing 1 creates two copies of Salesorderdetailtable. One replica is a heap version and the other is the same clustered index on the original table (Salesorderid,salesorderdetailid). Both tables do not have any nonclustered indexes.
We will perform the same three queries for each version of the table, one to retrieve a single row, one to retrieve all the rows for a single order, and one to retrieve all the rows for a single product. We give the SQL and the results of each execution in the table below.
Our first query retrieves a single row, and the execution details are shown in table 1.
Sql |
SELECT * From SalesOrderDetail WHERE SalesOrderID = 43671 and Salesorderdetailid = 120 |
Heap |
(1 row (s) affected) Table ' Salesorderdetail_noindex '. Scan count 1, logical reads 1495. |
Clustered Index |
(1 row (s) affected) Table ' Salesorderdetail_noindex '. Scan count 1, logical reads 3. |
Impact of having the Clustered Index |
IO reduced from 1495 reads to 3 reads. |
Comments |
No Surprise. Table scanning 121,317 rows to find just one are not very efficient. |
Table 1: Retrieving a single line
Our second query retrieves all the lines for a single sales order, and you can see the details of the execution in table 2.
Sql |
SELECT * From SalesOrderDetail WHERE SalesOrderID = 43671 |
Heap |
(one row (s) affected) Table ' Salesorderdetail_noindex '. Scan count 1, logical reads 1495. |
Clustered Index |
(one row (s) affected) Table ' Salesorderdetail_noindex '. Scan count 1, logical reads 3. |
Impact of having the Clustered Index |
IO reduced from 1495 reads to 3 reads. |
Comments |
Same statistics as the previous query. The heap still required a table scan, while the clustered index grouped the one detail rows of the requested order Sufficie Ntly close together So, the IO required to retrieve rows is the same as the IO required to retrieve one row. An upcoming level would explain in detail, no additional reads were required to retrieve the additional rows. |
Table 2: Retrieving all rows for a single SalesOrder
Our third query retrieves all the rows for a single product, and the results are shown in table 3.
Sql |
SELECT * From SalesOrderDetail WHERE ProductID = 755 |
Heap |
(228 row (s) affected) Table ' Salesorderdetail_noindex '. Scan count 1, logical reads 1495. |
Clustered Index |
(228 row (s) affected) Table ' Salesorderdetail_index '. Scan count 1, logical reads 1513. |
Impact of having the Clustered Index |
IO slightly greater for the clustered index version; 1513 reads versus 1495 reads. |
Comments |
Without a nonclustered index on the ProductID column to help find the "Rows for a" single Product, both versions had to be s Canned. Because of the overhead of have a clustered index, the clustered index version is the slightly larger table; Therefore scanning it required a few more reads than scanning the heap. |
Table 3: Retrieving all rows for a single product
The first two queries greatly benefit from the existence of clustered indexes; The third one is roughly equal. Is it sometimes harmful to have a clustered index? The answer is yes, mainly related to inserting, updating, and deleting rows. Like many other aspects of the index encountered in these early stages, this is also a higher-level, more detailed topic.
In general, the retrieval benefit is greater than the maintenance damage, making the clustered index more suitable for the heap. If you are creating a table in an Azure database, there is no choice. Each table must be a clustered index.
Conclusion
A clustered index is an ordered table whose order is specified by you when you create the index and maintained by SQL Server. Any row in the table can be accessed quickly, depending on its key value. In an index key sequence, any set of rows can be accessed quickly through the range of keys.
Each table can have only one clustered index. The decision about which columns should be clustered index key columns is the most important index decision you will make for any table.
In our level four, we will focus on moving from logic to physics, introducing pages and scopes, and examining the physical structure of the index.
Downloadable code
Clustered index: SQL Server index Level 3