SQL Server is an easy-to-misunderstand problem with the physical ordering of clustered tables

Source: Internet
Author: User

For MS SQL Server databases, there are several issues that are misleading and, even though many SQL Server DBAs are wrong or not fully aware, I would like to write a few articles that explain these easy-to-understand mistakes in depth, I also hope to correct the online misinformation of these problems (I may also have a wrong understanding of the place, welcome criticism).

First question: The physical order of the clustered tables. This is a confusing question, because many textbooks say "clustered indexes are physically storing data in the sort order of a clustered key" when they talk about a clustered index, so it's easy to have the following misconceptions:

Misunderstanding one: "In order to physically store" is the disk itself;

Misunderstanding two: The pages of the clustered table are physically sequential;

Misunderstanding three: The pages of the clustered table are physically sequential, and are sequential.

Here are some of the three ideas that are really misleading.

For misunderstanding one, we have to understand how the data rows are stored on the data page.

I intercepted it from MSDN, and as you can see, at the end of each data page, there is an array of "line offsets" that record the starting position of each data row at the beginning of the page, that is, how many bytes should be offset from the page header at the beginning of each row of data. I will change it and it may be easier to understand correctly:

I adapted this figure to indicate that 96 bytes from the header of the data is the beginning of the first row of data, the offset of 200 bytes is the beginning of the second row of data, and the offset of 300 bytes is the beginning of the third row of data.

Let's take a practical example of how the aggregated table's data might look on a data page.

CREATE TABLE Test (   RowId   int NOT null primary key,   Column1 char (+)) INSERT into Test (rowid,column1) Select 1 , ' Unionselect 2, ' Unionselect 10, '

For the table above, the first time the data is inserted, the three rows of data appear on the data page as follows:

And if I insert the following data into it:

INSERT into Test (ROWID,COLUMN1) Select 7, "

Because the ROWID column of this table has a clustered index (primary key creates a clustered index by default), and the number 7 is about 2, and less than 10, does SQL Server move the data ROWID = 10 down when the insert operation is performed? Obviously SQL Server is not so stupid, but only append ROWID = 7 data data to ROWID = 10 after the row of data, and then modify the line offset array, as follows:

Now we should be able to understand that the physical order of data rows on a clustered Index table data page depends only on the row offset list and does not depend on the physical location on the disk.

For misunderstanding two and misunderstanding three can be put together in the discussion, before the discussion, we need to understand the SQL Server storage engine in the concept of page-to-region.
1. The basic unit of data storage in SQL Server is the page.
2. The district is a collection of eight logically contiguous pages that are used to manage pages efficiently, which also means that all data pages must belong to a certain area.
3. Distinguish between mixed and unified areas. Pages in a mixed area can be assigned to multiple database objects, and pages in a unified area must be assigned to a database object.
4. When SQL Server requests space for a database object, it needs to use the information of the GAM,SGAM,PFS system page, and when the space is allocated, it also maintains the interest of the GAM,SGAM,PFS system page.

Above I briefly listed a few points, more detailed information you can refer to the following link information:
Http://msdn.microsoft.com/zh-cn/library/cc280360 (v=sql.100). aspx

I will then cite specific circumstances to prove that two and three are indeed misunderstandings:
1. When we create a new aggregation table and insert data into the table, the SQL Server storage engine will look for free pages from the mixed area to allocate to the table when it takes up less than 8 data pages. In this phase, it is easy to show that a mixed zone is allocated to multiple database objects (up to 8 database objects) at the same time. So, if our aggregation table needs to request disk space again, it is likely to be allocated to a discontinuous page in the initial mixing zone (relative to the page assigned to the clustered table), or to find the free page from another mixed area, so that the "clustered table page is physically sequential" is not guaranteed anyway. The "page is continuous" is not guaranteed.

2. If a clustered table is over 8 pages, then all subsequent space requests will be allocated a complete unified area, can the subsequent unified zone be sequential or continuous? Of course not, when the action of an application space is issued, the storage engine traverses the first GAM of the data file where the table resides, and if a unified zone has previously been assigned to some other database object, but the current zone has been released, SQL Server This free zone will be allocated to our aggregation tables, which will not guarantee the physical "order and continuity" of the page, and even if SQL Server allocates a clustered table that has never been allocated to any database object, there is no way to guarantee the page's physical "order and continuity".

3. If the aggregation table appears paging, then the "previous page" of the page chain of the new application page will point to the split data page, and the next page will point to the next page that the split page points to before splitting. Obviously, there is no way to ensure that the page is physically "sequential and sequential", and that it is one of the index fragmentation scenarios.

Conclusion: For the physical order problem of the data rows of the clustered index table, the only way to indicate the data page order of the clustered Index table is the page linked list on the data page, in terms of page and page. Because the page link list clearly indicates the previous page of this page, and the next page of the page number is what, and the page number determines the previous page and the next page of the physical location; If you narrow your gaze to a data page, the factor that determines the physical location of the data row is only the row offset array of the footer.

(go) SQL Server easy to misunderstand the problem of aggregation table physical order

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.