Related to the problem of index fragmentation, we should have heard a lot, perhaps also a lot of friends have done with the relevant work. Let's take a look at the problem today.
In order to better illustrate this problem, we first to popularize some background knowledge.
Knowledge popularization
As we all know, each table in the database is either a heap table or a table containing a clustered index, or we call it an ordered table. If the table is a heap table, then when querying data using a nonclustered index, a bookmark is used to find the data in the underlying data table to retrieve the information needed, and the bookmark lookup will locate each data row of the underlying data table through the row identification (RID) contained in each index. If there is a clustered index on the table, the clustered index key is used to locate the underlying data row when the nonclustered index is used to find other required data.
We also know that indexes are made up of index pages, and each entry in the index is contained in a page. Each of the 8 pages consists of one block.
The index level is bottom-up, is a tree structure, the bottom is the No. 0 layer, is also a leaf node. The root node in the index is at the top of the entire index.
If you want to scan an entire index, it means that you must read each page in the page node (either a data page or an index page). Where each page contains a pointer to the page that precedes it and one that points back to it. Before, we also mentioned: if you look at a certain layer of nodes, is actually a doubly linked list. Or the last picture, let's feel it.
We should know: the logical order in which the page (whether it is a data page, an index page, or another type of page) is in the logic sequence and its physics is not necessarily the same, it is said that the pointer in page a points to its next page B, it is said that A and B pages are together logically, but they may not be the same in physics, Even page B and page A are physically separated by hundreds of pages.
If the pages that are connected on the logic are closer to the physical storage level, the I/O cost of reading these pages will be lower because of the delay in generating the disk's head movement. Conversely, if their physical storage order is consistent with the logical order, SQL Server can read it one time because it reads a block (8 pages) at a time.
Well, when we get to the basics, we'll see what is fragmentation.
What is index fragmentation
Index fragmentation can be divided into two categories: internal index fragmentation and external index fragmentation. Let's take a look at the differences and how to check this before.
Internal index Fragmentation
Each index page contains some indexed entries (a row of data rows that resemble a data page), as we've said before. However, many times, not every page contains the maximum number of bars. For example, a page size of 8k, that is 4096 bytes, except for some headers, footers, etc., and 8,000 more bytes left, if each index entry of the big trivial 100 bytes, then the index page can contain the maximum of 80 entries, but in many cases, but does not contain so much.
Also say that, a lot of times, the index page is not fully filled, or this is a problem, perhaps so we deliberately, we will refer to later. When we talk about index fragmentation, we often mean that these index pages are not completely filled. Or more clearly: we originally wanted the page to be filled, but as the data additions and deletions, so that the data in the index is not filled, the result is as follows:
Index_page_after_delete.png(26.79 K)
9/9/2012 11:19:53 AM
The picture is not very clear, we can just have a little bit of it.
We can use
Sys.dm_db_index_physical_stats to view the relevant internal fragmentation, execute the query as follows:
- SELECT ix.name as ' name '
- , ps.index_level as ' level '
- , Ps.page_count as ' Pages '
- , ps.avg_page_space_used_in_percent as ' page fullness (%) '
- From Sys.dm_db_index_physical_stats (
- DB_ID (),
- OBJECT_ID (' Sales.SalesOrderDetail '),
- Default, default, ' detailed ') PS
- JOIN sys.indexes IX
- On IX. object_id = PS. object_id and ix.index_id = ps.index_id
Copy Code
Execution results
20120904184258.png(56.68 K)
9/9/2012 11:19:53 AM
We can see the fill of the page for each index.
Next, let's talk about external index fragmentation.
External index Fragmentation
Understanding the above problem, this external index fragment is good to understand, the simplest is that the index page in the logical order and the physical order of the inconsistent. Let's take a look at a comparison of the figures.
9865.jpg(95.72 K)
9/9/2012 11:19:53 AM
In, an index contains 16 pages. However, the 16 pages are not contained in 2 contiguous blocks, but are distributed in different places because some of their previous blocks are occupied by other objects. This leads to 16 pages that are not contiguous in physics, which is fragmentation. When read, additional I/O is consumed.
As before, we can use
Sys.dm_db_index_physical_stats to see the external fragmentation situation. But the value of the parameter here may change: Before the last parameter value in Sys.dm_db_index_physical_stats is ' detailed ', here our value is limited or default. Because external fragments are concerned with continuity issues before the index page, and do not pay attention to the data on each page, this is only a partial scan, and there is no need for all the scans. You can refer to MSDN for further understanding of the meaning of these parameters.
The query is as follows:
- SELECT ix.name as ' name '
- , ps.index_level as ' level '
- , Ps.page_count as ' Pages '
- , ps.avg_fragmentation_in_percent as ' External fragmentation (%) '
- , ps.fragment_count as ' fragments '
- , ps.avg_fragment_size_in_pages as ' avg Fragment size '
- From Sys.dm_db_index_physical_stats (
- DB_ID (),
- OBJECT_ID (' Sales.SalesOrderDetail '),
- Default, default, ' LIMITED ') PS
- JOIN sys.indexes IX
- On IX. object_id = PS. object_id and ix.index_id = ps.index_id
Copy Code
The results are as follows:
20120904184512.png(30.11 K)
9/9/2012 11:19:53 AM
In addition to using scripting, we can also view in SQL Server Manager, right-click on an index, and the properties are as follows:
Here to explain, because the original English version in the understanding of the above may be difficult, in order to make people better understand the original text, we here deliberately added some other content, help friends to make a transition.
Because index fragmentation analysis involves some knowledge of page splitting, page splitting occurs when the data on a page fills up without extra space for the new data, and adding new data to a page that already fills the data may lead to another operation, so let's talk about a , so that everyone better understand.
As we've mentioned before, SQL Server puts any information in the database on a 8KB-based page (no matter what type of page we have here, we don't consider a large object's data page) above. If the sum of the size of the record (whether it is the underlying data row record or an entry in the index, etc.) adds up to less than 8KB, SQL Server may store multiple records on a single page. If it is larger than 8KB, then it is definitely necessary to save more pages for the record, at which point SQL Server must change the records on each page. SQL Server implements this change primarily based on two methods: Record forwarding and page splitting.
Note: Record-we have a general term for the data, such as the data page above each piece of data is a record, the index page above an entry is a record.
Record forwarding
When the size of a record exceeds the capacity of a page, the first way to store records is "record forwarding."
This method is only used when the underlying data table is a heap . If the data record of a row is modified so that the data page in which it is located cannot hold all of the data for its modified row, SQL Server will move the record to a new data page and add two pointers at a time. The first pointer will indicate the new position of the data row, which is usually referred to as the "record forward Pointer", and the second pointer will be placed on the new data page, pointing to the original data page of the record, which is referred to as the "callback pointer". A friend who is familiar with data structure can actually think of this process as adding a node to a linked list.
To make it easier for everyone to understand the above, let's look at an example. In the example, we'll take a look at how the record forwarding process is going. Such as:
20120906203332.png(30.64 K)
9/9/2012 11:20:20 AM
Assume that the page in the diagram is numbered 100 and that the page is in a heap table. This page contains 4 data, and each data size is about 2K, plus 8KB. If the second data is updated so that its data size becomes 2.5KB, then the data page will not be able to store all the data at this time, SQL Server then assigns a new page, assuming the number is 101. Then the second piece of data is moved to the newly allocated data page, and a pointer to the new position of the second data is added to the previous page (number 100). So the place where the second record was originally placed is the pointer.
In addition, in the new page 101, there is also a pointer back to page 100. It is not drawn in the picture.
The problem with record forwarding is that it allows a single piece of data to exist in two locations in a single table: one position holds the pointer, and one position holds the real data. As the record continues to grow, more additional disk space is added, especially when the data is read with additional I/O operations, because there may be situations where some records are constantly modified so that they do not fit on the current page, so they are placed on a new page, the first record is forwarded, and then modified, Then repeat the second record forwarding .... For example:
20120906203429.png(31.27 K)
9/9/2012 11:20:20 AM
You should be able to realize that the original data A has been forwarded several times, and on the other page is just the next page of the forwarding process is the location, so that in order to find a data, it is necessary to go through a number of pointers to find, until the end.
Page splits
For page splitting, I believe that many friends listen to more than one word. Now, let's take a look at this topic. Page splits occur in tables that contain indexes, either with clustered indexes or nonclustered indexes. At the same time, page splits occur not only on the data page, but also on the index page.
The process of page splitting is basically this: if the size of a record is updated (or incremented) so that the original page does not fit the size of the data, and SQL Server cannot write the changed data, then it will move half the records above the original page to the new page. After that, SQL Server tries to write the data again, and if it does not, it will be paged again until the data can be written.
We are still using an example to explain the problem. We mainly talk about it through an updated operation. Still see the following figure:
20120906203529.png(31.46 K)
9/9/2012 11:20:20 AM
On page 100 There are 4 records, each of the size of about 2KB, this time just to fill a page. If the second piece of data is modified so that its size becomes 2.5KB, then the page is split. Then the original 4 data will be divided into 2 parts on different pages, and SQL Server will put a pointer to the new page on the previous page 100, and then SQL Server to update the second record.
Well, after we've done both of these things, let's look at how they affect the fragmentation of the index.
In fact, when it comes to fragmentation, as long as it happens on a page split operation, especially if the index's B-tree structure occurs when the page is split.
Below, we will refine this process.
If, at this point, the table already has an index, if one row of data is added to the data table, then the row of data must be reflected in the index structure (unless a filtered index is used), thus making the index structure begin to adjust.
If this entry added to the index structure can be added to an index page, in other words, the free space in the index page can accommodate the size of the new index entry, which is the end of the process.
If there is not enough space, then, it is necessary to assign a new page, at this time is not sure whether the new page and the old page in the physical space above the continuous, then this generates external index fragments, and the original page of the index records distributed on two pages, so that the two pages have more free space than before, This increases the internal index fragmentation.
However, the internal fragmentation may decrease as the index record grows and its idle padding is increased. But the outer fragments only disappear when we maintain the index.
In fact, as you can see, not only index fragmentation, but also the fragmentation of the underlying data pages can be used the same analysis method.
SQL Server Introduction Stage 11th: Index fragmentation Analysis and resolution