T-SQL queries fragments and fill factors in advanced-SQL Server Indexes

Source: Internet
Author: User

Preface: This articleArticleI need you to have a certain understanding of the index and SQL data storage methods. The title of the two words is only because this article requires my T-SQL advanced series of articles some content as the basis.

Introduction

In SQL Server, the minimum unit of data storage is page. Each page can contain 8060 bytes of data. the page is organized by the B-tree structure (the heap structure is used when no clustered index is found on the table, which is not discussed in this article), as shown in:

In clustered Index Tree B, only leaf nodes actually store data, while other root nodes and intermediate nodes are only used to store data for searching leaf nodes.

Each leaf node is one page, and each page is inseparable. the smallest unit for SQL Server to store data to each page is the row of the table ). when a newly inserted row or updated row in a leaf node makes the leaf node unable to accommodate the row currently updated or inserted, the page is generated. fragments are generated during the paging process.

 

 

Understanding external fragments

First, understanding the "external" of external fragments is relative to the page. External fragmentation refers to the fragmentation caused by paging. For example, if I want to insert a row in an existing clustered index, this row will make the existing page space unable to accommodate new rows. This leads to paging:

Because in SQL Server, new pages are generated as data grows, and clustered indexes require continuous rows, in many cases, the pages after pages are not consecutive with the original pages on the disk.

This is the so-called external fragment.

Because paging will move data between pages, If you insert updates and other operations often need to cause paging, Io consumption will be greatly increased, resulting in performance degradation.

For search, external fragments do not affect the performance of a specific search condition, such as a where clause that has a very small limit or returns an unordered result set. However, if you want to return a scanning clustered index and search for consecutive pages, external fragments will have a performance impact.

In SQL Server, the unit larger than the page is area (extent ). a zone can contain 8 pages. zone is the physical unit allocated to the disk. therefore, if a page is split across zones, multiple partition operations are required. More scans are required because continuous data reading cannot be preread, resulting in additional physical reads and disk I/O increase.

 

Understand internal fragments

Like external fragments, the "inner" of internal fragments is also relative to the page. Let's look at an example:

We create a table in which each row is composed of INT (4 bytes), char (999 bytes), and varchar (0 bytes). Therefore, each row contains 1003 bytes, in this case, 8 rows occupy 1003x8 = 8024 bytes of space, and some internal overhead can be accommodated in one page:

When we randomly update the col3 field in a row, the page cannot accommodate new data, resulting in paging:

After pagination:

However, when a page is displayed, if the new page and the current page are physically discontinuous, external fragments will be generated.

Impact of internal and external fragments on query performance

The impact of external fragments on performance has been mentioned above, mainly because more cross-zone scans are required, resulting in more Io operations.

Internal fragmentation may cause data rows to be distributed on more pages, which increases the page tree to be scanned and reduces query performance.

The following example shows that we insert some data into the table to cause internal fragmentation:

By viewing the fragments, we found that the fragments have reached a high level:

By viewing the IO before and after the fragmentation, we can see that Io is greatly reduced:

 

Solution to fragmentation

Basically, all solutions are based on re-indexing and sorting of indexes.

1. Delete and recreate the index

This method is not good. The index is unavailable during the deletion of the index, which may cause blocking. If you delete a clustered index, the corresponding non-clustered index will be re-built twice (Reconstruction upon deletion and reconstruction upon creation ). although this method is not good, it is most effective for sorting indexes.

2. Use the drop_existing statement to recreate the index.

To avoid re-indexing twice, use the drop_existing statement to re-indexing because the statement is atomic and will not cause non-clustered indexes to be re-built twice. However, this method will also cause blocking.

3. As shown in the previous article, use the alter index rebuild statement to recreate the index.

Using this statement also re-creates indexes, but dynamically re-creates indexes without uninstalling and re-indexing. It is better than the first two methods, but it still causes blocking. You can use the online keyword to reduce the lock, but it will prolong the reconstruction time.

4. Use alter index reorganize

This method does not rebuild the index, nor generate new pages. It just sorts out the pages. When a page is locked, it is skipped, so it will not cause blocking. But at the same time, the sorting effect will be worse than the first three.

 

Understanding fill factor

Re-indexing can solve the fragmentation problem, but the re-indexing cost is not only troublesome, but also causes blocking. The index reconstruction cost is not high when the data is relatively small. When the index itself exceeds MB. The time for re-indexing will be a headache.

This is exactly how fill factor works. For the default value, if the fill factor is 0 (0 and 100 indicate a concept), the page can be 100%. Therefore, the page is displayed due to insufficient space during the previous update or insert operation. By setting the fill factor, you can set the page usage level:

Here is an example:

Or the table above. If I insert 31 data records, it takes 4 pages:

By setting the fill factor, the page is set to 5 pages:

At this time, I insert another page, which will not cause pagination:

The above concepts can be explained as follows:

We can see that using the fill factor will reduce the number of pages updated or inserted, but because more pages are required, the corresponding query performance will be lost.

How to set the fill factor value

There is no formula or idea for setting the fill factor value. Although the fill factor can reduce the page size during update or insertion, it also reduces the query performance and occupies more disk space because more pages are required. how to set this value for trade-off depends on the specific situation.

Based on the read/write ratio of the table, I will provide a suitable value here:

1. When the read/write ratio is greater than, do not set the fill factor, 100% fill

2. When the number of writes exceeds the number of reads, set 50%-70% to fill

3. 80%-90% fill when the read/write ratio is between the two

The above data is just my opinion. The specific data must be tested based on the actual situation to find the optimal one.

 

Summary

This article describes how fragments are generated in SQL Server and the concepts of internal fragments and external fragments. As well as solutions to fragmentation and fill factors, in the database, each function that adds performance to a certain aspect will also weaken the performance. It is a required course for DBAs and developers to learn database knowledge and balance the knowledge according to specific situations.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.