Microsoft SQL Server 2000 best practices for index fragmentation (Part 2)

Source: Internet
Author: User
Tags bulk insert intel pentium

DSS workload

During the test, the DSS workload includes 22 query statements of the report type consisting of complex select statements. These queries are strictly run on the server in batches. All queries contain one or more
For multi-table join, most queries need to scan a wide range of indexes.

Table 2 records the average fragmentation and page density of the indexes used in the test. The fragment level is obtained through the combination of subordinate behaviors:

& Nbsp;

  • Insert new data to the database in Bulk insert mode and refresh the data periodically.
  • Delete data within a certain range.
  • Some update operations are performed based on the key value. Although this affects the fragmentation level at least, the number of records involved in the update operation is relatively small compared with the insert and delete operations.
Table 2 Test of average logical fragmentation and page density in small-scale and large-scale Environments
Fragmentation level Average logical fragmentation (%) Average page density (%)
Small-scale environment & Nbsp; & Nbsp;
Low (1) 7.8 80.1
Medium (2) 16.6 68.1
High (3) 29.5 69.2
Large-scale environment & Nbsp; & Nbsp;
Low (1) 5.9 84.4
Medium (2) 13.8 70.3

The test results of DSS work types differ greatly from those of OLTP. After Sorting index fragments, the performance is improved significantly. Because the performance of the workload is strongly dependent on the disk throughput (most queries
Will contain index scans), so this result is predictable. Figure 2 and Figure 3 below show the performance gains of DSS workloads before and after sorting index fragments. As shown in, the performance is significant from fragment.
Upgrade.

In small-scale environments, the performance is improved by 60% at a low level of fragmentation, and 460% at a high level of fragmentation. In a large-scale environment, the performance is improved by 13% at a low level of fragmentation, and at a medium level
Increased by 40%. The results show that the effect of fragmentation on performance is relatively small in a large scale environment, because the environment has benefited more from the performance of the disk subsystem. For more details, see the following sections.
And SQL Server pre-read manager & rdquo.

Figure 2: running time of the entire DSS workload in a small-scale environment at different fragment levels. The lower the value, the better the performance.

Point
View Source images

Figure 3: running time of the entire DSS workload in a large scale environment at different fragment levels. The lower the value, the better the performance.

Point
View Source images

Figure 2 shows that in a small-scale environment, the result of DBCC indexdefrag is better than that of DBCC.
Dbreindex is better. However, in most cases, fully re-indexing should have better performance.

Note the following points when interpreting these test results:

& Nbsp;

  • The results shown in Figure 2 and Figure 3 indicate that DBCC is applied.
    Indexdefrag's & ldquo; Best & rdquo; occasion. Testing, DBCC
    Indexdefrag runs on a forbidden system. Therefore, DBCC indexdefrag can completely remove fragments. When DBCC
    Indexdefrag runs on a dynamic system, that is, data is updated online, and DBCC indexdefrag skips the locked pages. Therefore, DBCC
    Indexdefrag may not be able to completely remove fragments. To measure the role of DBCC indexdefrag, you can
    Immediately run DBCC showcontig after indexdefrag.
  • Data Distribution affects disk performance. In a small-scale environment, data is only distributed on two physical disks (the total disk capacity is 33.5 GB). Before the database is created, the two disks are empty.
    . After the database is created, the data file size ranges from 22 GB to 30 GB. When a database is created, data is distributed outside the disk. DBCC
    Indexdefrag fragment also starts from the nearest position of the original data. Because DBCC
    Dbreindex completely re-creates the index. before releasing the old index page, it must first allocate space to the new index page. The new space is allocated so that the data is far away from the original data and distributed inside the disk. Therefore
    This causes a slight reduction in I/O throughput. In a benchmark test in a small-scale environment, this decline shows a 15% reduction in read data throughput.
  • The remaining space capacity also affects DBCC.
    Dbreindex. Without a large number of consecutive idle space, dbreindex forces the use of free space in data files, resulting in a small part of logical fragments during index reconstruction.
    For more information about DBCC dbreindex's requirements for the remaining space, see the & quot; DBCC dbreindex & quot; Section after this article.

& Nbsp;

Determine the queried I/O traffic

Because queries with a large number of I/O reads and writes benefit the most from the fragmentation, it is necessary to discuss how to determine the I/O traffic of a specific query. Set statistics
Io commands can report the read volume and read type on the server instance when a specific query is completed. You can select the ON and OFF commands in the query manager as follows:

Set statistic Io on
Go
Select * From table_1
Go
Set statistic Io off
Go

& Nbsp;

Output example

Table 'table _ 1 ′.
Scan count 1,
Logical reads 12025,
Physical reads 0,
Read-ahead read s 11421.

Table 3 Description of set statistic Io output results
Value Description
Scan count Number of scans saved med
Logical reads Number of pages read from the data cache
Physical reads Number of pages read from disk
Read-ahead reads Number of pages placed into the cache for the query

Using Physical reads and read-ahead reads values, You can estimate the I/O volume involved in a specific query. Physical
Both reads and read-ahead reads indicate the number of pages read from the disk. In most cases, read-ahead reads is better than physical
The value of reads must be large.

Note:
When obtaining information through SQL profiler, the reads column indicates the logical read volume (logical
Instead of physical reading (physical reads ).

In addition to sorting the pages again, by increasing the page density of the index leaf pages, the index fragmentation also reduces the number of I/O queries executed. As the page density increases, fewer reads are required to complete the same query.
To improve the performance.

Understand the impact of fragment and the SQL Server pre-read manager

Fragments have a negative impact on the operations on a large number of disk edges. You can use the Windows performance monitor to measure the impact. The Performance Monitor allows you to observe disk activity and
It helps determine when to perform fragment.

To understand why fragmentation has such an impact on DSS-type work, it is important to understand how fragmentation affects SQL
Server pre-read manager. To complete the query that requires scanning one or more indexes
Server pre-read manager scans index pages in advance and places additional data pages in SQL
Server data cache. The pre-read manager dynamically adjusts the read Volume Based on the physical order of the basic page. When there are few fragments, the pre-read manager can read large data blocks in real time, more efficiently using I/O Sub-
System. When data fragments are generated, the pre-read manager can only read small data blocks. Although the number of pre-reads has nothing to do with the physical sequence of data, because a small number of read requests consume more CPU/clock
In the end, the throughput of the entire disk is reduced.

In all cases, the pre-read manager can improve performance. However, when there are fragments and the pre-read manager cannot read large data blocks, the throughput of the entire disk decreases. By checking
Physical disk-related counters can find this phenomenon. The following table lists and describes these counters.

Table 4 physical disk counters in Performance Monitor
Physical disk counter Description
AVG disk SEC/read This counter is used to measure the disk latency. The test shows that disk latency increases when fragments are generated at a high level (greater than or equal to 30%.
Disk Read Bytes/sec This counter can well measure the overall disk throughput. The decrease in workload over a period of time can be used to indicate that fragments are affecting performance.
AVG disk Bytes/read This counter is used to measure the data read volume of each read request. When the index page is continuous, SQL
The server pre-read manager can read large data blocks at a time, which makes use of the I/O subsystem more efficient. The test shows that there is a direct relationship between the counter value and the number of shards. When the fragment level rises, this value is
To reduce the disk throughput.
AVG disk read Queue Length Generally, this counter is the continuous average value on each two physical disks. During the test, it is likely that the counter increases with the increase of fragments due to high latency and low overall disk throughput.
Add.

Figure 4 to Figure 7 shows the Disk Throughput and average read size reported by Performance Monitor in DSS-type work.

Figure 4: Disk Throughput of the DSS workflow in a small-scale environment. The higher the value, the better the disk throughput.

Point
View Source images

Figure 5: Disk Throughput of the DSS workflow in a small-scale environment. The higher the value, the better the disk throughput.

Point
View Source images

Figure 6: Average size of each disk read in the DSS operation in a small-scale environment. The higher the value, the more bytes are read each time.

Point
View Source images

Figure 7: Average size of each disk read in the DSS operation in a small-scale environment. The higher the value, the more bytes are read each time.

Point
View Source images

The figure above shows the impact trend of fragments on disk performance. Although DBCC dbreindex and DBCC are used
Indexdefrag has different results, but note that the results are consistent across all systems, that is, the average size of each read and the overall Disk Throughput decrease with the increase of fragments. Just as you
As you can see, sorting all fragments greatly improves the disk throughput.

The average size of each read can also be used to demonstrate how fragments affect the ability of the pre-read manager to read large data blocks. This should be kept in mind. However, a large average read volume does not always mean the overall
High throughput. The average reading capacity indicates that the CPU load is low during data transmission. When the index has no fragments, the speed of reading 64 KB data is almost the same as that of reading KB data. This
Conclusion This is especially true for large systems with data distributed across multiple disks. These general and special conclusions are based on different systems for various reasons (for example, different I/O subsystems
Data Types, distribution characteristics of data on disks, and so on. When monitoring the system, find the Disk Throughput that lasts for a long period of time and the decline in the number of reads. These stages and DBCC
The information provided by the showcontig command can be used to determine when the index fragmentation should be performed.

Test results show that fragmentation also results in disk latency. However, the highest level of fragmentation has a huge negative impact on disk latency, and this phenomenon is only available in small-scale environments. In large scale
In this environment, because San provides high I/O performance, the disk latency is very small and never causes problems.

 

DBCC dbreindex vs. DBCC indexdefrag

Besides using the create Index Command to delete or recreate an index, you can also use DBCC dbreindex and DBCC
Indexdefrag command to help maintain indexes.

DBCC dbreindex

DBCC dbreindex is used to recreate one or more indexes on the specified table. DBCC
Dbreindex is an offline operation method. When this operation is running, the tables involved cannot be accessed by users. DBCC
Dbreindex dynamically recreates the index. There is no need to know the structure of the table involved in the reconstruction, whether to use the primary key, uniqueness constraints, and other information; it will be automatically managed during the reconstruction. DBCC
Dbreindex completely re-creates the index, that is, restores the page density level to the initial (default) fill factor level. Of course, you can also select a new value for the page density. DBCC
Dbreindex is very similar to running Delete and recreate an index with a T-SQL statement.

The following two points are the advantages of DBCC dbreindex over DBCC indexdefrag:

& Nbsp;

  • DBCC dbreindex automatically reconstructs statistics during index reconstruction, which significantly improves performance.
  • DBCC dbreindex can run in a multi-processor environment. Taking advantage of the multi-processor, it can be very fast to reconstruct large and fragmented indexes.

& Nbsp;

DBCC
All the work of dbreindex is a single, atomic transaction. You must create a new index and replace the old index. Then, the old index page is released. Sufficient data files are required for reconstruction.
. DBCC
Dbreindex either cannot be re-indexed or generates logical fragments greater than 0. The required free space depends on the number of indexes to be created in the transaction. It is good for clustered indexes.
The guiding formula is as follows: free space required = 1.2 * (average row size) * (number of rows ).

For non-clustered indexes, you can calculate the average size of each row contained in a non-clustered index (non-clustered index contains the length of the keyword + the index keyword or row in the clustered index ).
Id length) multiplied by the number of rows to obtain the required space. If you re-create an index for the entire table, you need to leave sufficient space for clustered and non-clustered indexes. Similarly, if the reconstruction is not a unique non-clustered index
And non-clustered indexes. Because SQL server must create a unique identifier for each row, non-clustered indexes are implicitly rebuilt. Use DBCC
In dbreindex, it is better to specify the indexes that need to be sorted. In this way, you can have more control over operations and avoid unnecessary troubles.

DBCC indexdefrag

DBCC indexdefrag is used to reconstruct a specified index. Similar to DBCC dbreindex, you do not need to consider the basic structure of the table. However, DBCC
Indexdefrag cannot use a statement to reconstruct all indexes. DBCC indexdefrag must be run once for each index that you want to fragment.

Different from DBCC dereindex, DBCC
Indexdefrag is online. Therefore, you can still access the organized tables and indexes when sorting out index fragments. Another one and DBCC
Different from indexdefrag, DBCC indexdefrag can stop and restart without losing any work information. Whole DBCC
The dbreindex operation runs as an atomic transaction. This means that if DBCC is aborted
Dbreindex operation, the entire operation will be rolled back, and you must start again if you continue. However, if you stop DBCC
Indexdefrag, the task will be immediately terminated and the completed task will not be lost, because DBCC indexdefrag each work unit is an independent transaction.

DBCC indexdefrag has two phases:

& Nbsp;

  1. Compress the page and try to adjust the page density based on the fill factor specified during index creation. DBCC
    Indexdefrag will increase the page density level as much as possible based on the initial fill factor. While DBCC
    Indexdefrag does not, but it will reduce the page density higher than the initial fill factor.
  2. Move pages to make the physical order consistent with the logic order of the index leaf-level pages, so as to sort the index fragments. This work is done by a series of independent small transactions; therefore, DBCC
    Indexdefrag has little impact on the overall system performance. Figure 8 shows page movement in the fragmentation phase of DBCC indexdefrag.

& Nbsp;

Figure 8: Data File page movement of DBCC indexdefrag DBCC
Indexdefrag does not help sort the indexes that are scattered and inserted into data files. This distributed insertion is called interleave. Similarly, DBCC
Indexdefrag does not sort out the extended page fragments. When the data in the index extension page (expansion page = 8 pages) is not continuous, interleave appears. At this time, the number of multiple extension pages
Data is in a cross state in the file. Even if the logical order is the same as the physical order, all index pages are not necessarily continuous. Therefore, even if there is no logical fragmentation, interleave
Yes.

Although the limitations of DBCC indexdefrag are mentioned above, tests show that DBCC indexdefrag improves performance and DBCC
Dbreindex is equally useful. In fact, from the test results, we can see that even if the index is re-built, this part of optimization will not significantly improve the performance. Reduce latency
Fragment level optimization can maximize the performance. This is why we recommend that you focus on logical fragmentation and page density fragmentation when checking index fragmentation. Table 5 summarizes DBCC
The difference between dbreindex and DBCC indexdefrag.

Table 5 Comparison Between DBCC dbreindex and DBCC indexdefrag
Functionality DBCC dbreindex DBCC indexdefrag
Online/offline Offline Online
Faster when logical fragmentation is: High Low
Parallel Processing Yes No
Compacts pages Yes Yes
Can be stopped and restarted without losing work completed to that
Point
No Yes
Able to untangle interleaved Indexes May reduce interleaving No
Additional free space is required in the data file for defragmenting Yes No
Faster on larger Indexes Yes No
Rebuilds statistics Yes No
Log space usage High in full recovery mode (logs entire contents of the index), low
In bulk logged or simple recovery mode (only logs allocation of space)
Varies based on the amount of work completed MED
May skip pages on busy Systems No Yes

Performance: DBCC dbreindex vs. DBCC indexdefrag

The test shows that either DBCC dbreindex or DBCC
Indexdefrag can effectively sort the index fragments and restore the page density near the page density specified by the initial fill factor. Based on these results, we need to decide when to apply
.

If offline index reconstruction is allowed for a period of time, DBCC dbreindex is generally faster than DBCC indexdefrag. DBCC
Dbreindex can fully utilize the parallel performance of a multi-processor system. DBCC
Indexdefrag is used in scenarios where it has little interference with the production environment and has little impact on performance. Test shows that even if Several DBCC
Indexdefrag works in parallel, and the impact on performance decline will never exceed 10%. However, this also makes DBCC
It takes a long time for indexdefrag to organize large indexes. In addition, the length of work depends on the access work running on the server at that time.

Figure 9 shows DBCC indexdefrag and DBCC
Performance Comparison of dbreindex. The data in the figure is the time for sorting all indexes in a small-scale environment (the results in a large-scale environment are similar, DBCC
Indexdefragy runs 8 times of DBCC indexreindex ). When the fragmentation level increases and the index size increases, DBCC
Dbreindex can be executed faster than DBCC indexdefrag.

Figure 9: time required for sorting all index fragments in a small-scale environment

Point
View Source images

Log considerations: DBCC dbreindex vs. DBCC indexdefrag

The final problem to be investigated is the difference in the data volume written into transaction logs when DBCC indexdefrag and DBCC dbreindex are used. DBCC
The volume of data written to transaction logs in indexdefrag depends on the fragmentation level and the amount of work completed. The test shows that when the database is fully recovered, DBCC
The data size of indexdefrag to write transaction logs is much smaller than that of DBCC dbreindex. DBCC
The log data volume of indexdefrag can be greatly changed. This is because DBCC
The workload of indexdefrag fragment is determined by the number of page movement and the number of necessary page compression. Because DBCC
Indexdefrag is composed of a series of small transactions. Therefore, you can back up the part of the log space used by DBCC indexdefrag.

From the log perspective, DBCC dbreindex and DBCC
Indexdefrag is slightly different. In bulk log recovery mode, the log measuring tools are the most different. In full recovery mode, DBCC
Dbreindex has a log image for each index page, which is not available in log recovery mode. Therefore, in full recovery mode, DBCC
The log space required by dbreindex is approximately equal to the number of index pages multiplied by 8 KB. You can use DBCC
Showcontig is used to determine the number of pages of a given index. Run DBCC in a large-scale Environment
For dbreindex, we recommend that you change the recovery mode to the log recovery mode. After the operation is complete, change to the full recovery mode.

Note:
: In a large scale environment, it takes a great deal of time to roll back the transaction, so it is important to understand the log Requirements.

Figure 10 shows the differences in the use of DBCC indexdefrag and DBCC dbreindex log space in Small and Medium fragment environments. Although DBCC
The indexdefrag log space fluctuates greatly, but the test results can reflect the general difference between DBCC dbreindex and DBCC indexdefrag.

Figure 10: DBCC indexdefrag and DBCC
The whole log space used by dbreindex

Point
View Source images

 

Conclusion

Index fragmentation varies with different types of work. Some applications can greatly improve the performance from fragment. Understanding application features, system performance, and SQL
The fragment statistics provided by the server are the key to determining when to perform fragment. SQL
Server provides commands to complete index fragmentation. This article helps us determine when and how to sort index fragments to maximize performance.

 

More information

For more information about monitoring, analyzing, and improving performance, visit the following resources:

& Nbsp;

  • Microsoft Knowledge Base Article 243589: & quot; how:
    Troubleshoot slow-running queries on SQL Server 7.0 or later & quot;
    At http://support.microsoft.com/default.aspx? SCID = KB; en-US; 243589 & amp; SD = tech
  • Microsoft SQL Server 2000 performance tuning technical reference,
    Microsoft press, ISBN: 0-7356-1270-6
  • & Quot; Windows 2000 Io Performance & quot; from Microsoft
    Research at http://research.microsoft.com/BARC/Sequential_IO/Win2K_IO.pdf

& Nbsp;

 

Appendix A: test environment

The test in this article uses the following hardware and software environment:

Microsoft software

Microsoft Windows 2000 Data Center (Service Pack 3)
Microsoft SQL Server 2000 Enterprise Edition (Service Pack 2)

Hardware Platform

Small-scale environment:
Dell poweredge 1, 6450
4 Intel Pentium III Xeon 550 MHz processors
4 GB RAM

Large-scale environment:
Dell poweredge 1, 8450
8 Intel Pentium III Xeon 550 MHz processors
16 GB RAM

Storage

Small-scale environment:
1 Dell powervault 660f, with 2, 18 Gbit/s 10,000 rpm Disks
Total disk space = 36 GB (RAID 0)

Large-scale environment:
1 Hitachi freedom storage lightning 9960 system, with 192, 73 GB, 10,000
Rpm Disks
Total disk space = 13 Tb (~ 6 TB after RAID 1 + 0 and further
Striping/slicing)

Host bus adapters (HBA)

8 Emulex lp9002l PCI host bus adapters
Firmware 3.82a1
Port driver v5-2.11a2

Fabric Switch

1 McDATA switch, 1 GB

Storage Management Software

Hitachi Command Control Interface (CCI)
Hitachi shadowimage

Databases

Representative OLTP and DSS Databases

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.