SQL Server Io subsystem I

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Author: obuntu

I like the theory that SQL server uses four steps to locate performance problems. The details are as follows:
1. resource bottleneck
I. Memory II. CPU III. Io
2. tempdb bottleneck
3. Find slow statements. You can search for them in three ways.
I. Statistics II. Missing indexes III. Blocking
4. cache Execution Plan Analysis

For more information, see this article http://blogs.msdn.com/ B /jimmymay/archive/2008/09/01/sql-server-performance-troubleshooting-methodology.aspx

We can see that when a system performance problem occurs, the first step is to determine whether there is a bottleneck in the resource. Among the CPU, memory, and Io, the IO subsystem is the most likely bottleneck. In fact, I/O subsystems have a deep connotation. factors that affect the performance of I/O subsystems include the number, size, and speed of disks, file allocation unit size, and HBA; network bandwidth, disk cache, controller, whether to use San (storage area networks), raid level, bus speed, Io channel, and so on.
As a user of SQL Server, I/O Sub-system configuration is rarely adjusted. One is insufficient, and the other is lack of relevant knowledge and skills. However, it is necessary to understand the relevant issues in this regard. In addition to better playing the role of hardware, it can also be well positioned and analyzed when encountering system performance problems.

Directory
I/O subsystem concepts
SQL Server Io concepts
Io subsystem counters in Performance Monitor
Sqlio
I/O Best Practices
Summary
References

I/O subsystem concepts

A. Disk

Since the past few decades, the disk has achieved rapid development. From ata, SATA, SAS, to the present SSD, every technological change has brought about improved disk performance. The most widely used sas disk should be a 15 K conversion. For such disks, some traditional disk concepts remain unchanged, such as tracks and sectors.

The current hard disk is generally composed of a group of overlapping disks, each of which is divided into equal tracks and numbered at the same time. Each track is divided into several arc segments. These arc segments are slice blocks. The slice is generally 512 bytes and has a size of 1 K, 2 K, and 4 K. At the same time, disks of the same number form a cylindrical disk on different disks. The number of cylinder is equal to the number of tracks, and the number of disk surfaces is equal to the total number of magnetic heads. Therefore, the so-called CHS concept is available on the hard disk, that is, cylinder, Head, and sector ). The disk capacity is equal to the number of cylinders * Number of CIOs * number of sectors * the size of the slice.

Io modes include sequential read/write and random read/write. The disk performance is different when processing these two read/write modes. In general, in order to read and write data, a 10 k disk can get 40 MB/s ~ 80 Mbit/s transmission rate; 15 kb to disk, can get 70 Mbit/s ~ 125 Mb/s transmission rate. For random read/write, its performance depends on the disk speed and track time. It takes 6 ms (1*60*1000/10000) to complete a full rotation of a 10 k disk ). The hard disk wait time, also known as latency, refers to the time when the head is in the track to be accessed and the sector waiting for access is rotated below the head. The average wait time is half of the time required to rotate the disk for one week. Generally, the wait time should be less than 4 ms. Therefore, it is generally considered that the wait time on the disk is 3 ms, for a 15 K disk, It is 2 ms.

Another factor that affects the disk performance is the seek time, which means the average time it takes for the hard disk to move the head from the start to the track where the data is located after receiving the system command. At present, the average seek time for 10 k-to-disk reading is 4.6 ms, and the average seek time is 5.2 Ms. For a 15 K disk, the average seek time for reading is 3.5 ms, and the seek time for writing is 4.2 Ms.

For example, for reading small pieces of 8 KB, the transmission time is about 0.1 Ms. If other negligible factors are ignored, we can conclude that the total latency is about 8 ms (10 K disk) and 5.6 ms (15 K to disk). Therefore, the random read performance of a disk is about 125 iops (10 K disk) and 175 iops (15 K disk ).

The above situation is ideal. If the dataset on the disk is in a small area, the average seek time of the disk will be reduced, and the performance will be better. However, if multiple IO requests occur at the same time, the disk also needs to serialize and sort multiple IO requests, so that the latency will be longer with higher throughput. Generally, if the data is distributed across the entire disk, the higher the queue depth (queue depth), the longer the latency. If the queue depth is 4, the latency will reach 20 ms, and the queue depth is 32, the latency can reach 100 ms. Queue depth refers to the number of I/O operations on the disk in parallel. Therefore, we recommend that you use 2 for the queue depth value. Of course, different storage systems also have different recommended values. You can refer to the relevant materials when setting the values. However, note that if the data is only distributed in a small area of the disk, such as 5%, the latency does not increase significantly as the queue depth increases, generally, the queue depth is 8 to 20 ms, and the queue depth is 16 to 40 ms, and the random read performance is greatly improved. Each iops can reach 400. this feature enables you to have a lot of flexibility in handling strong transactions.

B, raid

Nowadays, the real enterprise application environment seldom uses a single disk to store files, but uses RAID technology. RAID technology can improve performance and effectively fault tolerance. To put it simply, raid combines multiple independent hard disks (physical hard disks) in different ways to form a hard disk group (logical hard disks ), this provides higher storage performance than a single hard disk and Data Backup technology. RAID levels is used to form disk arrays ).

Raid has many levels, that is, there are multiple combinations of disks. Raid10 and RAID5 are commonly used. The overall performance of raid10 is higher than that of RAID5, but the price is also higher. Determine which raid level is used, which has a great impact on the system performance. Therefore, you need to perform a full test to weigh your actual situation and make a choice.

C. Other Concepts

The size of the file allocation unit, that is, the size of the cluster, which is generally an integer multiple of the size of the slice. For example, the size of the cluster is 4 K and the size of the slice is 512 bytes, then a cluster uses eight sectors. You can use the/A: Size Option of the format command to specify hard disk formatting. Generally, the proper size of SQL Server data files and log files is 64 KB, but sometimes 32 KB can provide better performance. Therefore, before setting this value, it is best to perform a full test to decide. The following is an example of viewing the current file allocation unit size.
C: \ Documents ents and Settings \ Administrator> fsutil fsinfo ntfsinfo D:
NTFS Volume serial number: 0xde500ef9500ed7e3
Version: 3.1
Number of zones: 0x0000000012c03620
Total number of clusters: 0x0000000012c03620
Available Cluster: 0x000000001098efb6
Total number of reserved items: 0x0000000000000000
Number of bytes per sector: 512
Number of bytes per cluster: 512
Number of bytes per filerecord segment: 1024
Number of clusters in each filerecord segment: 2
Valid MFT Data Length: 0x0000000004a68000
MFT starting lcn: 0x0000000000600000
Mft2 start lcn: 0x0000000009601b10
MFT region start: 0x0000000000625460
MFT region end: 0x0000000002b80800

When configuring raid, there is a parameter that can be manually set: stripe size. The stripe size of the logical drive, representing the amount of data the Controller writes to a physical disk each time, In kb. The selection of different stripe sizes directly affects performance, such as iops and throughput. The stripe size value is small. When multiple disks respond to multiple I/O requests, the I/O access rate (iops) can be increased. The stripe size value is large, multiple disks can respond to one I/O Request, increasing the data transmission rate (mbps ). to achieve higher performance, you need to select the bandwidth equal to or less than the size of the operating system cluster. Large-capacity strips produce higher read performance, especially when reading continuous data. When reading random data, it is better to set the bandwidth to a smaller value.

Therefore, we can see that the above settings are also helpful for the performance of SQL Server, but it is difficult to have a suitable recommendation. Sometimes most of them remain default, if you really need to set up in this area, it is best to ask the manufacturer of the relevant product or perform a full test on your own.

SQL Server Io concepts

The SQL Server engine has its own internal disk Io management mechanism. It is necessary to understand the IO Processing Mechanism of SQL Server. Microsoft has two very good white papers called SQL server I/O basics Chapter 1 and SQL server I/O basics Chapter 2, if you are interested in this aspect, you cannot miss it. However, only the English version is available. The two copies add up to about 100 pages. The following briefly describes some key points of SQL Server Io. For more details, refer to the two white papers.

Write Ahead Logging (WAL) Protocol

When writing data to a data file, SQL Server needs to write the content of the log file to the transaction log file on the disk in advance. This is the wal mechanism. This mechanism can protect and solidify transactions. Only in this way can the durability of transactions be realized. SQL server implements the wal mechanism by using the file_flag_write_through identifier of createfile.

Synchronous vs asynchronous I/O

Synchronous I/O means that the I/o api performs the next processing only after the I/O request is completed; asynchronous I/O indicates that the I/o api only needs to send an I/O request, and then continue to process other content. Then, check whether the I/O has been completed.

On SQL Server, 98% uses asynchronous I/O, which allows SQL Server to continue to use CPU and other resources effectively after writing or reading a page. During asynchronous I/O processing on Windows, the overlapped structure is used to save relevant I/O information, and hasoverlappediocompleted is used to identify whether I/O has been completed. After SQL Server 2005, the dynamic management view SYS. dm_io_pending_io_requests is introduced. The io_pending column corresponds to hasoverlappediocompleted.

Scatter/gather I/O

Before SQL Server 2000, SQL Server's checkpoint needs to maintain a list of dirty data pages when writing the dirty data pages of the buffer pool to the disk, and then write the dirty data pages in sequence, therefore, if an I/O problem occurs when a page is written, the performance of the entire checkpoint may degrade. Therefore, after SQL Server 2000, the scatter/gather I/O method is introduced. Scatter means that when reading data pages from the disk to the memory, you do not need to allocate consecutive pages in the memory, you can distribute pages in different places in the buffer pool by calling the readfilescatter API. Gather indicates that when writing data pages from memory to disk, you do not need to maintain the previous dirty data page list, but scan the entire buffer.
After the pool, the dirty data page is directly written to a contiguous area of the disk, which is implemented by calling the writefilegather API. Obviously, this processing method is more effective, not only in the SQL I/O path, but also in the page file.

Sector alignment, block alignment

In SQL Server, transaction logs are written not by page size (8 KB), but by slice size. The slice is used for writing to prevent transaction logs from being re-written, resulting in transaction log corruption. A check bit is maintained on the slice. When writing a log file, you can check the check bit to determine whether the slice can write logs. This ensures the log validity.

The slice size is transparent to users, that is, SQL Server will automatically perform relevant Processing Based on the disk slice size. For example, when restoring a slice 512bytes to a slice 1024bytes, the subsequent log writing is based on bytes.

Because the minimum unit of a block is 8 KB, and because it is on a disk, the first 63 sectors are hidden sectors by default, which are used to store MBR (Master Boot information ), that is, the size of the hidden sector is 31.5kb. This is called a partition offset. If no effective setting is made, additional I/O is generated, which affects the performance. This issue will be detailed in the future.

Generally, to determine the proper sector size, you can use a formula (partition offset) * (Disk sector size)/(stripe unit size) to ensure that the result is an integer. For example, if the size of a stripe is 256, the formula result must be an integer after the offset of at least 512 sectors. Therefore, you must set the offset of at least kb.
(63*512)/262144 = 0.123046875
(64*512)/262144 = 0.125
(128*512)/262144 = 0.25
(256*512)/262144 = 0.5
(512*512)/262144 = 1

You can also use DBCC fileheader ('dbname') to view the slice size of a file.

Latching and a page: A read walk-through

Latch is a lightweight lock used to protect various system resources. On I/O, it is used to protect data pages in the memory and ensure data consistency. In SQL Server, there are two types of I/O latch: page_io * _ latch and page * _ latch, these two types of wait types can be used to locate I/O and memory problems. In addition, latch also has the properties such as SH (SHARE) and ex (exclusive.

When page_io * _ latch is used to read or write data to the page, if the read/write duration is too long, this kind of wait will be obvious. For example, when a page is read from a physical file, an ex latch will be requested until the read is complete, so that the read process can be guaranteed, it will not be modified. Page * _ latch adds latch to pages that already exist in the memory, and adds latch only when necessary. The latch of the sh type does not block the latch of the sh type, but blocks the latch of the ex type.

At the same time, it should be noted that latch only occurs in the user mode. sqlos is responsible for resource competition management in the kernel mode,

In latch, SQL Server also introduces the sub-latch mechanism to reduce hot page competition. Sub-latch only occurs on pages that already exist in memory. For example, when SQL server detects a latch with a high sh behavior for a period of time, it will promote the latch held to sub-latch, sub-latch divides a latch into multiple corresponding latch structure queues based on the Logical CPU, so that the worker only needs to request a sh sub-latch for the local scheduler, in this way, you can avoid chain activities and use fewer resources to improve the ability to process hot pages. Of course, all these are SQL
Server automatically occurs, and we do not need to intervene.

Reading page

When a worker thread of the CPU needs to request a page, the bufferpool: getpage module is called. The getpage function scans the Buf structure. If the requested page is found, the system adds latch to the page and returns it to the caller. If no latch is found, the page needs to be read from the disk.

When reading a page, there are multiple behaviors, such as the read ahead mechanism, but the basic steps are as follows:
Step 1: send a request to the memory manager to allocate a fixed page size;
Step 2: the page is associated with a Buf structure that tracks the page;
Step 3: add the ex latch on the page to prevent modification;
Step 4: insert the Buf structure into a hash table in the memory. In this way, all requests using the same Buf and page will be protected by Ex latch. If the related object is already in the hash table, you do not need to perform this step, but directly go to the hash table to obtain the relevant content;
Step 5: Create an I/O Request and send the I/O Request (asynchronous I/O)
Step 6: try to obtain the requested Latch Type;
Step 7: Check the error conditions. If an error exists, an error is thrown.

If an error occurs, other activities may occur. For example, if the checksum verification fails, re-read will occur. From the above steps, we can see that after the page is read, the related ex latch will not be released immediately, but will not be released until the page verification is complete.

Writing Page

When writing a page, it is very similar to reading a page. When writing a page, it indicates that the page already exists in the memory, and the Buf status is marked as dirty (changed). You can view the dirty page through SYS. dm_ OS _buffer_descriptors. When a page is written, SQL Server calls writemultiple. During page writing, three threads are involved: lazywriter, checkpoint, and eager write.

Lazywriter is a thread that regularly scans the buffer pool to check the Free List size. After SQL Server 2008, The tla (time last access) algorithm is introduced, which improves the LRU. Lazywriter judges the pages marked as dirty pages based on this algorithm. If the pages are outdated, writemultiple is called to write the related dirty pages to the disk.

Checkpoint is used to identify whether the changed page associated with all submitted transactions has been written to the disk. Checkpoint is the start point of recovery. Unlike lazywriter, checkpoint does not remove the dirty page from the cache, but marks it as clean ). There are many conditions that will trigger the checkpoint. When the checkpoint occurs, writemultiple will be called to complete related writing.

In some BCP and blob field operations, some pages must be written to the disk in the memory to complete related transactions, it is also done by calling writemultiple.

When writing a requested page, it not only requests to write dirty pages, but also writes adjacent pages to reduce I/O requests and improve I/O performance. Writing pages also requires the support of latch. Generally, the request for ex is used to prevent possible page modifications in the future. However, SQL Server can also use sh latch to read related content during page writing.

PAE and awe

This is not to be said, but there are two points to note: first, PAE and awe are independent. If awe is enabled, PAE is not required. If PAE is enabled, awe is not required; second, awe only expands the buffer pool size and does not extend plan cache.

Read ahead

If we enable set statistics Io on, we will often see the amount of pre-read content. The pre-read mechanism of SQL Server can greatly improve the asynchronous I/O capability.

Sparse files and copy on write (COW) pages

Sparse files are mainly used in online DBCC and snapshot databases. Sparse files generally occupy much less space than the file size. When a snapshot database is created, the copy on write action is triggered. Copy on write checks when a page is to be written, determine whether the page has been copied to the snapshot database. If not, the page will be written to the snapshot database before it is changed to ensure that the snapshot content remains at a certain time point. To maintain snapshot data, file control block chaining (fbcs) will be available in the parent database to manage the correspondence between snapshot and parent, so that copy
On write can be quickly located.

Although snapshot is very small at the beginning, it will gradually become larger as the parent database changes. Therefore, this must be taken into account when creating the snapshot database. Because frequent interaction with data files may be required, you also need to put snapshot on a device with good I/O performance.

When querying the snapshot database, the Read Request is first generated in the snapshot database. If the related page has not been copied from the parent database, send the Read Request to FBC on the parent and read the related page from the parent database. This ensures the sparsity of snapshot.

DBCC also uses snapshots to complete related content. Of course, this snapshot is internally maintained. This concept also clarified the misunderstanding that "DBCC checkdb locks pages in the Database ". In fact, after 2005, DBCC maintains internal snapshots to check Database Consistency. Of course, this requires high I/O requirements and requires a lot of space. If the conditions are not met, you can use with table lock to directly perform consistency checks on database files.

Scribbler (s)
Scribbler indicates that a child colors out of the box, indicating that a component does not belong to its area in the memory to change data. This results in data corruption. In SQL Server 2000, the torn page verification mechanism is introduced to prevent such behavior. After SQL Server 2005, The Checksum mechanism is also introduced.

If page_audit is set to checksum, lazywriter checks pages in the memory and recalculates the checksum value on the page. If the values are inconsistent, an error is recorded and the page is directly removed from the memory. This indicates that a "scribbler" occurs ". The trace page "scribbler" is difficult, but trace flag-t831 can be enabled to obtain more detailed content.

Page verification is an important part of SQL Server I/O. For more information, see the White Paper SQL server I/O basic.

You can see that SQL Server provides a wide range of internal I/O management mechanisms. By understanding these concepts, you can better understand the working mechanism of SQL Server and be able to cope with some internal errors or I/O subsystem settings. For more information, see the SQL server I/O basic white paper.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

SQL Server Io subsystem I

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

SQL Server Io subsystem I

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support