How to identify IO bottlenecks in SQL Server

Source: Internet
Author: User

Problem:

We may often encounter frequent shutdown of SQL Server databases. After analyzing the memory and CPU usage, we need to continue investigating whether the root cause is I/O. How should we identify whether SQLServer has I/O-related bottlenecks?

Solution:

When data pages are frequently moved from the buffer pool, the I/O subsystem becomes one of the key factors in SQLServer performance problems. Transaction logs and tempdb also generate significant I/O pressure. Therefore, make sure that your I/O Sub-system runs as expected. Otherwise, you will become the victim of response time growth and frequent timeout. This article describes how to use built-in tools to identify I/O-related bottlenecks and provides some disk configuration methods:

Performance Monitor ):

You can use performance counters to check the load of the I/O subsystem. The following counters can be used to check disk performance:

PhysicalDisk Object: Avg. DiskQueue Length: calculates the average Read and Write Request queue from the physical disk. An excessively high value indicates that the disk operation is in the waiting state. When this value exceeds 2 for a long time during the SQLServer peak, it proves that you need to pay attention to it. If there are multiple hard disks, divide these values by 2. for example, if there are four hard disks and the queue is 10, the average value is 10/4 = 2.5. Although it also proves that you need to pay attention to it, you cannot use the value 10.

Avg. Disk Sec/Read and Avg. Disk Sec/Write: displays the average time of reading from or writing to a Disk. 10 ms is a good performance, and below 20 is acceptable. This value proves that there is a problem.

Physical Disk: % Disk Time: Ratio of the duration when the Disk is busy reading or writing requests. According to the thumb's law, this value should be less than 50%.

The Disk Reads/Sec and Disk Writes/Sec counters show the read/write operations in the Disk. These two values should be less than 85% of the disk capacity. When this value is exceeded, the disk access time will increase exponentially.

You can use the following methods to calculate the capacity of increasing loads. One way is to use SQLIO. You should find that the throughput is stable but slow.

You can use the following formula to calculate RAID configurations:

Raid 0: I/O per disk = (reads + writes)/number ofdisks

Raid 1: I/O per disk = [reads + (writes * 2)]/2

Raid 5: I/O per disk = [reads + (writes * 4)]/number of disks

Raid 10: I/O per disk = [reads + (writes * 2)]/number of disks

For example, for RAID 1, if the following counter is obtained:

Disk Reads/sec = 90

Disk Writes/sec = 75

According to the formula: [reads + (writes * 2)]/2 or [90 + (75*2)]/2 = 120I/OS for each disk.

Dynamic Management View (DMVs ):

There are a lot of swimming DMVs that can be used to check I/O bottlenecks:

When a page is used for read or write access and the page does not exist or is unavailable in the buffer pool, an I/O latch wait (I/O latch) is triggered ), it will be in PAGEIOLATCH_EX/PAGEIOLATCH_SH (depending on the request type ). These waits indicate an I/O bottleneck. You can use sys. dm_ OS _wait_stats to find the lock wait information. If you save the waiting_task_counts and wait_time_ms values under normal SQL server running and compare them with the values, you can identify the I/O problems:

Select *

Fromsys. dm_ OS _wait_stats

Where wait_type like 'pageiolatch %'

Order by wait_typeasc

Pending I/O requests can be found in the following query and used to identify the bottleneck of the Disk:

Select database_id,

File_id,

Io_stall,

Io_pending_ms_ticks,

Scheduler_address

From sys. dm_io_virtual_file_stats (NULL, NULL) iovfs,

Sys. dm_io_pending_io_requests as iopior

Where iovfs. file_handle = iopior. io_handle

Disk Fragmentation ):

We recommend that you check disk fragments and configure disks for SQLServer instances. Fragments in the NTFS file system may have a serious performance impact. Disk fragments need to be sorted frequently and the fragment plan needs to be specified. Research shows that in some cases, SAN performance is worse after fragmentation. Therefore, SAN must be treated as needed.

The index fragmentation on NTFS can also cause high I/O usage. However, this is different from SANs.

Disk configuration/best practices:

In general, you should store log files and data files separately for better performance. I/O features of heavily loaded data files (including tempdb) are read at random. Log files are accessed sequentially, unless the transaction needs to be rolled back.

Built-in disks can only be used for database log files because they have good performance for sequential I/O, but have low performance for random I/O.

The data and log files of the database should be stored in the corresponding dedicated disk. Ensure good performance. It is recommended that log files be placed on two built-in disks and configured as RAID 1. The data files reside in the SAN system only used for access to SQLServer, and are only controlled by queries and reports. Special access should be disabled.

Write buffer should be allowed when possible and can be used even when power is down.

To minimize the impact of I/O bottlenecks on OLTP systems, we should not mix OLAP and OLTP environments. And make sure that your code is optimized and you have an appropriate index to avoid unnecessary I/O.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.