Problem:
We may often experience the frequent shutdown of SQL Server databases. After analyzing memory and CPU usage, we need to continue to investigate whether the source is in I/O. How do we recognize that SQL Server has I/O-related bottlenecks?
Solve:
When data pages are often moved out of the buffer pool, the I/O subsystem becomes one of the key factors in SQL Server performance problems. Transaction logs and tempdb also generate significant I/O pressure. Therefore, you must make sure that your I/O subsystem works as expected. Otherwise you will be a victim of increased response times and frequent timeouts. In this article, you will describe how to use the built-in tools to identify I/O-related bottlenecks and provide some methods for disk configuration:
Performance counters (performance Monitor):
You can use performance counters to check the load on the I/O subsystem. The following counters can be used to check disk performance:
PhysicalDisk Object:Avg.DiskQueue Length: Calculates the average read and write request queues from a physical disk. An excessively high value indicates that the disk operation is in the waiting state. When this value is longer than 2 for the peak of SQL Server, it is necessary to note that. If you have more than one hard disk, you need to divide the values by 2. For example, there are 4 hard drives, and the queue is 10, then the average is 10/4=2.5, although it also proves to be concerned, but not the value of 10.
Avg.Disk Sec/read and Avg.Disk sec/write: Displays the average time that is read from disk or written to disk. 10ms is a good performance, below 20 is acceptable. Higher than this value proves that there is a problem.
Physical Disk:%disk Time: The rate of duration when the disk is busy reading or writing requests. According to the law of thumb, this value should be less than 50%.
Disk reads/sec and Disk writes/sec counters show the rate of read and write operations on disk. These two values should be less than 85% of the disk's capacity. When this value is exceeded, the disk's access time will grow exponentially.
You can calculate the ability to progressively increase the load in the following ways. One way to do this is to use Sqlio. You should find the throughput more stable, but slow to grow.
You can use the following formula to calculate a RAID configuration:
Raid 0:i/o per disk = (reads + writes)/number Ofdisks
Raid 1:i/o per disk = [reads + (WRITES*2)]/2
Raid 5:i/o per disk = [reads + (WRITES*4)]/number of disks
Raid 10:i/o per disk = [reads + (WRITES*2)]/number of disks
For example: For RAID 1, if you get the following counter:
Disk reads/sec = 90
Disk writes/sec =75
According to the formula: [Reads + (WRITES*2)]/2 or [A + (75*2)]/2 = 120i/os per disk.
dynamic management view (DMVs):
There are a lot of swimming DMVs can be used to check I/O bottlenecks:
When a page is used for read or write access and the page does not exist or is unavailable in the buffer pool, an I/O latch wait (I/O latch) is raised, which is pageiolatch_ex/pageiolatch_sh (depending on the type of request). These waits indicate an I/O bottleneck. You can use Sys.dm_os_wait_stats to find information about the latch wait. If you save the waiting_task_counts and Wait_time_ms values of SQL Server under normal operation, and compare this value, you can identify the I/O problem:
SELECT *
Fromsys.dm_os_wait_stats
where wait_type like ' pageiolatch% '
ORDER BY WAIT_TYPEASC
Pending I/O requests can be found in the following query and used to identify the bottleneck that the disk is responsible for:
Select database_id,
FILE_ID,
Io_stall,
Io_pending_ms_ticks,
Scheduler_address
From sys.dm_io_virtual_file_stats (null, NULL) IOVFS,
Sys.dm_io_pending_io_requests as Iopior
where Iovfs.file_handle = Iopior.io_handle
Disc fragmentation (disk fragmentation):
It is recommended that you check disk fragmentation and configure the disks that are used for SQL Server instances. Fragmentation in the NTFS file system can have a significant performance impact. The disk needs to defragment frequently and specify the defragmentation plan. Research shows that in some cases the SAN performs worse after defragmenting. Therefore, the SAN must be treated according to the actual situation.
Index fragmentation on NTFS also causes high I/O to be useful. But this is not the same as the effect in Sans.
Disk Configuration/Best practices:
In general, you should keep log files and data files separate for better performance. I/O characteristics for heavily loaded data files (including tempdb) are random reads. For log files, sequential access is required, unless the transaction needs to be rolled back.
The built-in disks can only be used for database log files because they have good performance for sequential I/O, but low on random I/O performance.
The data and log files of the database should be placed on the corresponding dedicated disk. To ensure good performance. It is recommended that the log files be placed on two internal disks and configured for RAID 1. Data files reside in San systems that are only accessible to SQL Server and are only queried and reports controlled. Special access should be prohibited.
Write buffers should be allowed when possible, and ensure that power outages are also available.
To minimize the impact of I/O bottlenecks on OLTP systems, you should not mix OLAP and OLTP environments. and ensure that your code is optimized and that there are appropriate indexes to avoid unnecessary I/O.