An index is an on-disk structure associated with a table or view, which speeds up the retrieval of rows from a table or view. An index contains keys that are generated by one or more columns in a table or view. These keys are stored in a structure (B-tree), allowing SQL Server to quickly and efficiently find the rows associated with the key value.
A table or view can contain the following types of indexes:
Gathered
The clustered index sorts and stores the data rows in a table or view based on the key values of the data rows. The index definition contains a clustered index column. Each table can have only one clustered index, because the data rows themselves can only be sorted in one order.
Data rows in a table are stored in sorted order only when the table contains a clustered index. If the table has a clustered index, the table is called a clustered table. If a table does not have a clustered index, its data rows are stored in an unordered structure called a heap.
Non-clustered
Nonclustered indexes have a structure that is independent of the data rows. Nonclustered indexes contain nonclustered index key values, and each key-value entry has a pointer to the data row that contains the key value.
A pointer to a data row from an index row in a nonclustered index is called a row locator. The structure of a row locator depends on whether the data page is stored in a heap or in a clustered table. For heaps, a row locator is a pointer to a row. For clustered tables, the row locator is the clustered index key.
You can add nonkey columns to the leaf level of a nonclustered index to skip existing index key restrictions (900-byte and 16-key columns) and perform full-scope index queries.
Both clustered and nonclustered indexes can be unique. This means that no two rows can have the same index key value. In addition, the index may not be unique, that is, multiple rows can share the same key value. The index of the table or view is maintained automatically whenever the table data is modified.
Each page in the index B-tree is called an index node. The top node of the B-tree is called the root node. The underlying node in the index is called a leaf node. Any index level between the root node and the leaf node is collectively referred to as the intermediate level. In a clustered index, the leaf node contains the data page of the underlying table. The root node and the intermediate level node contain index pages that have index rows. Each index row contains a key value and a pointer to a data row in an intermediate-level page or leaf-level index on the B-tree. The pages in each level index are linked in a two-way link list.
The clustered index has a row in sys.partitions, where the index uses index_id = 1 for each partition. By default, a clustered index has a single partition. When a clustered index has more than one partition, each partition has a B-tree structure that contains data related to that particular partition. For example, if the clustered index has four partitions, there are four B-tree structures, each with a B-tree structure.
Depending on the data type in the clustered index, each clustered index structure will have one or more allocation units where the relevant data for the specific partition will be stored and managed. There is at least one In_row_data allocation unit in each partition of each clustered index. If a clustered index contains a large object (LOB) column, there is also a Lob_data allocation unit in each of its partitions. If a clustered index contains a variable length column that exceeds the row size limit of 8,060 bytes, there is also a Row_overflow_data allocation unit in each of its partitions. The pages and rows within the data chain are sorted by the clustered index key value. All insert operations are performed when the key value in the inserted row matches the sort order in the existing row. The B-Tree page collection is positioned by the page pointer in the sys.system_internals_allocation_units system view.
For a clustered index, the root_page column insys.system_internals_allocation_units points to the top of a particular partition for that clustered index. SQL Server moves down in the index to find the row that corresponds to a clustered index key. To find the scope of a key, SQL Server moves through the index to find the starting key value for the range, and then scans the data page with a forward or backward pointer. To find the first page of a data page chain, SQL Server scans from the root node of the index along the leftmost pointer.
A clustered index is explicitly structured in a single partition.
Nonclustered indexes have the same B-tree structure as clustered indexes, and the significant difference between them is the following two points:
You can either use a clustered index to define a nonclustered index for a table or view, or you can define a nonclustered index based on the heap. Each index row in a nonclustered index contains a nonclustered key value and a row locator. This locator points to a clustered index or a data row in the heap that contains the key value.
A row locator in a nonclustered index row, or a pointer to a row, or a clustered index key for a row, as described below:
If the table is a heap (meaning that the table does not have a clustered index), the row locator is a pointer to the row. The pointer is generated by the file identifier (ID), the page number, and the row count on the page. The entire pointer is called the row ID (RID).
If the table has a clustered index or a clustered index on the indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server adds internally generated values (called unique values ) so that all duplicate keys are unique. This four-byte value is not visible to the user. This value is added only if you need to make the clustered key unique for use in a nonclustered index. SQL Server retrieves a data row by using the clustered index key stored in a leaf row of a nonclustered index to search for a clustered index.
For each partition used by the index, the nonclustered index has a corresponding row in the sys.partitions of index_id >0. By default, a nonclustered index has a single partition. If a nonclustered index has more than one partition, each partition has a B-tree structure that contains the index rows for that particular partition. For example, if a nonclustered index has four partitions, then there are four B-tree structures, one per partition.
Depending on the data type in the nonclustered index, each nonclustered index structure has one or more allocation units in which to store and manage data for a particular partition. Each nonclustered index has at least one In_row_data allocation unit (storage index B-tree page) for each partition. If the nonclustered index contains large object (LOB) columns, there is also a Lob_data allocation unit for each partition. In addition, if a nonclustered index contains a variable-length column that exceeds the 8,060-byte row size limit, there is also a Row_overflow_data allocation unit for each partition. The page collection of the B-tree is positioned by the root_page pointer in the sys.system_internals_allocation_units system view.
Describes the nonclustered index structure in a single partition.
Description of clustered and nonclustered indexes for SQL Server indexes