Index, I believe everyone knows that it is adding a directory to the data in the table so that we can quickly retrieve the data we want, but what is this directory? How is SQL Server managed? To understand this, we need to understand the concept of the page, the area (extents) in the SQL Server database, and, if there is something wrong, welcome the shot.
Pages (page)
- A page is the most unit of data storage, and SQL Server read-write is also done in pages, where the page is the smallest unit of data management. The size of the page 8192b=8kb, 1MB stores 128 data pages.
- A page specific as shown (picture reference MSDN), contains 3 parts:
- Header information page hearder, occupy 96byte, store information about the page, page number, page type, etc.;
- Data line DataRows, the content of the specific storage, occupy 8060byte;
- Line Offset table row offset tables records the distance between the first byte of each data row and the top of the page (that is, 96byte), in the order opposite to the row in the page, accounting for 36byte;
Type of page This article covers primarily the following, other page types can query MSDN on their own
- Data: When the text in a line is set to ON, it has rows with all data except text, ntext, image, nvarchar (max), varchar (max), varbinary (max), and XML data.
- Index: Indexed entry.
- Text/image: Large Object Data type: (text, ntext, Image, nvarchar (max), varchar (max), varbinary (max), and XML data), data rows exceeding 8 KB Variable-length data type columns: (varchar, nvarchar, varbinary, and sql_variant)
Seeing the index page type above, we understand that indexes and data rows are also stored in the page, and SQL Server divides different page types for different data types, Big data types (Text,image,varchar (max), nvarchar (max) is stored in the Text/image page type, which also stores a variable-length type when a single row of data exceeds 8KB, that is, only one row in a page, and the data row occupies a space greater than 8060 (that is, 8192-96-36) byte, The value of all columns in the actual table cannot be greater than 8053byte (which is why the difference between 7byte can refer to this link), in which case SQL Server dynamically moves one or more columns to the page in the Row_overflow_data allocation unit, starting with a variable length column of maximum length. and maintains a pointer in the original page, and if subsequent rows of data become smaller, SQL Server will move the columns back.
Area (extents)
The zone is a contiguous 8 pages on disk, 8*8kb=64kb.
- Mixed-area Mixed extents: Each page may correspond to different objects;
- Unified Panel Uniform Extents: 8 pages in a zone corresponding to one object
Indexing (Index)
By the above we already know that the index is also stored on the page, the table with the clustered index (Clustered index) is b-tree (not B minus tree ...). The structure is organized and the data is stored in a clustered index order, otherwise it is stored on the heap (heap). Each page on the B-tree is called an index node, which contains:
- Root node: The top node that contains the index page that holds the index row;
- Intermediate level: between the root and leaf nodes, the Roots node contains index pages that have index rows;
Leaf node: the underlying node that contains the data page of the underlying table;
Index row: Contains a key value and a pointer to a data row in an intermediate-level page or leaf-level index on the B-tree. The pages in each level index are linked in a two-way link list.
Nonclustered indexes (nonclustered index): Have the same B-tree as the clustered index, with the following differences:
- Data rows are not sorted and stored by non-clustered keys;
A leaf-level node is not referring to a data page but an index page;
Index row: Contains a nonclustered key and a row locator that points to a clustered index or a data row in the heap that contains the key value.
Table Organization Chart (image reference MSDN), the bottom line is three allocation units, detailed directly to the MSDN connection to view.
Some thoughts:
Through the above content, we can more intuitive, more easily summed up:
- You should try to set the index on a narrower column so that more index rows can be stored on one page, and the efficiency of the retrieval is higher;
- It is not appropriate to set the index on a frequently changing column because it will result in a series of modifications to the data rows and index pages;
- Establishing multiple indexes can provide the query performance of the table, but also increase the complexity of the modified data;
- The index should overwrite the query as much as possible because the data is all stored in the index, thereby reducing disk I/O
...
Recently spent some time to tidy up the page, area, index of relevant knowledge points, and read some blog, feel before a lot of things do not understand the enlightened, in fact, there are a lot of relevant content, here is just a part of the introduction, in addition we still want to read more official documents (MSDN recommended to read English, The Chinese translation of a lot of places you are confused, see the English version can read it and Daniel's blog.
Reference links
- Msdn:https://docs.microsoft.com/zh-cn/sql/relational-databases/pages-and-extents-architecture-guide
- Http://www.cnblogs.com/woodytu/p/4486193.html
SQL Server indexes and pages and extents