Easily misunderstood SQL SERVER: physical order of clustered tables

Source: Internet
Author: User

For ms SQL Server databases, there are several problems that are easy to misunderstand. For these problems, even if many SQL Server DBAs have incorrect understanding or insufficient understanding, therefore, I want to write a few articles to elaborate on the problems that are easy to understand and make mistakes. I also hope to correct these questions on the Internet (or I may know something wrong, ).

The first problem is the physical order of the clustered tables. This problem is very confusing, because many textbooks refer to the similar saying that "clustered indexes store data physically in the order of clustered keys" when talking about Clustered indexes, therefore, we are prone to the following misunderstandings:

Misunderstanding 1: "Physical storage in order" is the disk itself;

Misunderstanding 2: the page of the clustered table is physically ordered;

Misunderstanding 3: the page of the clustered table is physically ordered and continuous.

 

The following describes the three ideas that are true misunderstandings.

For misunderstanding 1, we must understand how data rows are stored on data pages.

I cut it from msdn. As you can see, there is an array of "Row offset" at the end of each data page, this array records the starting position of the beginning of each data row in the page, that is, the number of bytes that the beginning of each data row should be offset from the page header. I will change it to make it easier to understand it correctly:

In the adapted figure, the 96 bytes offset from the header of the data page is the starting position of the first line of data, and the 200 byte offset is the starting position of the second line of data, the offset of 300 bytes is the starting position of the third row of data.

 

The following is a practical example to illustrate the possible appearance of the data in a clustered table on a data page.

 Create   Table  Test (rowid  Int   Not   Null   Primary   Key  , Column1  Char ( 100  ))  Insert  Into  Test (rowid, column1)  Select   1 , '   '  Union  Select   2 , '   '  Union  Select   10 , '   ' 

In the preceding table, when data is inserted for the first time, the three rows of data are displayed on the data page as follows:

If I insert the following data into it:

 
Insert IntoTest (rowid, column1)Select 7,' '

Because the rowid column of this table has a clustered index (primary key creates a clustered index by default), and the number 7 is about 2 and less than 10, then when SQL Server performs the insert operation, will rowid = 10 be moved down? Obviously, SQL Server won't be so stupid. Instead, it will only append the row rowid = 7 to the end of the row rowid = 10, and then modify the row offset array, as shown below:

Now we can understand:The physical order of data rows on the data page of a clustered index table is determined by the row offset list, not by the physical location on the disk.

 

Misunderstanding 2 and misunderstanding 3 can be discussed together. Before the discussion, we need to first understand the concept of page tracking in the SQL server storage engine.
1. The basic unit of data storage in SQL Server is page.
2. A partition is a set of eight logical consecutive pages for effective Page Management. This also shows that all data pages must belong to a certain partition.
3. It is divided into hybrid zone and unified zone. Pages in the hybrid zone can be allocated to multiple database objects. pages in the unified zone must be allocated to a database object.
4. when applying for a space for a database object, SQL Server needs to use the GAM, SGAM, PFS system page information. After the space is allocated, it also maintains GAM, SGAM, information on the PFS system page.

I have briefly listed several key points above. For more details, refer to the following link information:
Http://msdn.microsoft.com/zh-cn/library/cc280360 (V = SQL .100). aspx

Then I will list the specific situations to prove that the two and the three are indeed misunderstandings:
1. when we create a new clustering table and insert data into the table, when the table occupies less space than eight data pages, all SQL server storage engines allocate idle pages from the hybrid zone to tables. In this phase, a hybrid zone is easily allocated to multiple database objects at the same time (up to 8 database objects can be allocated ). If our clustered table needs to apply for disk space again, it is very likely to be allocated to discontinuous pages in the initial hybrid area (as opposed to the pages allocated to this clustered table ), you can also search for free pages in another hybrid zone. In this way, the page of the clustered table is physically ordered ", it cannot guarantee that "pages are continuous ".

2. If a clustered table has eight full pages, all subsequent space applications will be allocated a complete unified partition. Can the subsequent unified partition be sequential or continuous? Of course not. When an application space action is triggered, the storage engine will traverse the first GAM in the data file where the table is located to find the idle unified zone, if a unified partition has been assigned to another database object before, but the current partition has been released, SQL Server will allocate this idle partition to our clustered table, this will naturally not ensure the "Order and continuity" of the page physically. In addition, even if SQL Server assigns a clustering table to any database object area that never is assigned to us each time, there is no way to ensure the order and continuity of pages physically ".

3. if the clustering table is paginated, the "previous" page of the new application page will point to the split data page, the next page points to the next page before the split. Obviously, there is no way to ensure the order and continuity of pages physically, and it is one of the index fragmentation situations.

 

Conclusion: for the physical order of data rows in the clustered index table, from the perspective of pages and pages, the only thing that can specify the data page order in the clustered index table is the page linked list on the data page. Because the page linked list clearly specifies the previous page and the next page, the page number determines the physical location of the previous page and the next page; if you narrow your eyes down to a data page, only the row offset array in the footer determines the physical location of the Data row.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.