[Translation] concepts of raid and the impact of raid on SQL Performance

Source: Internet
Author: User
Introduction

We have heard of raid and often discuss raid as an SQL dBA, developer, or architecture engineer. However, many of us are not familiar with raid principles, levels, and how raid affects SQL server performance.

This articleArticleTo make up for this lesson.

 

Disk Architecture

Today's disk is very similar to the player at 45 rpm in 1970s (transfer/minute) (Do you still remember ?), It is just a media (surface) with an axis (track) and stores data in a disk segment called a sector.

Like a phonograph, a disk drive has a swing arm to control the needle (which can be called a "head" Here) to access data. But for a disk, it is not as read-only as a player, but can be read and written.

To read or write data from a specific sector, the disk must be rotated and then the swing arm is moved so that the head is moved to the top of the specified sector to access the data.

This process is the basic input/output operation process (I/O ).

 

Iops

The term iops is often used for X, but there are not many people who really understand this term.

Many people understand that iops is short for input output operations per second, But converting this definition into a practical concept is a bit difficult for some people.

The basic understanding of iops is a measure of the average time that satisfies a specific input/output request.

Here we need to know that this measurement standard is based on reading 0-byte files. This is only for the purpose of statistics and Standardization because the size of a disk sector is different.

 

Restrictions on physical disks

The disk has some physical restrictions that limit the IOP level of the disk. This limitation is the seek time and rotational latency ).

The track addressing time is used to move the head to the sector to be read and the average time spent on moving the swing arm.

The rotation delay is the time required for the head to read data from the disk at a specific position (usually in milliseconds ).

The time consumed by IOP is as follows:

Unit IOP time = track addressing time + rotation Delay

 

Therefore, with this formula, we can easily calculate the maximum iops of a given disk.

The iops number per second is also the one we are most interested in. The formula is as follows:

1 second/track addressing time + rotation delay.

 

Let's look at the following example:

HP 300 GB 15 k sas drive (200 knives)
Speed 15000
Average track addressing time 2.9 Ms
Average rotation Delay 1.83 Ms

 

We can use the formula to calculate iops:

Iops = 1/(2.9 MS + 1.83 ms)
= 1/(4.73 ms)
= 1/(0.00473)
= 211 iops

We can see that the iops of this disk is 211 (not terrible ).

If we want to save more money, let's look at a disk example and the difference with the above Disk:

 

HP 300 GB 7200 SATA drive (100 knives)
Speed 7200
Average track addressing time 10 ms
Average rotation Delay 2.1 Ms

 

You can see the maximum iops of the disk through the formula:

Iops = 1/(10 MS + 2.1 Ms)
= 1/(12.1 ms)
= 1/(0.0121)
= 82 iops

The maximum iops of the 7200-to-82 disk is 82.

Through comparison, we can see that the performance of the above two disks is significantly different, which is not difficult to understand why the same capacity price is so much worse.

In addition, the document shows that if you use a disk to reach its peak iops, it will produce a request queue and lead to latency (a very evil term in SQL Server, avoiding it like a plague ).

Most of the documents I have read suggest keeping IOP around 80% of the maximum iops. So the maximum iops of the first disk we discussed above is 211. If we serve more than 168 iops, we will begin to see the delay.

Now I know the iops numbers that can be reached by a single disk. The next thing is how many iops is required for an SQL server instance in the production environment?

I only obtain this data by viewing the physical disk: disk transfers/sec counters of the perfmon tool in the production environment.

The number is:

 

Driver Average iops Maximum iops
Data and Indexing 2313 16,164
Logs 81.5 1,127
Tempdb 141 2,838

 

The above data shows that our fast disk can only process 168 of iops, so the conclusion is that a disk cannot meet the above iops requirements in any case.

So the only way to solve this problem is to use a certain mechanism to adjust multiple disks to meet the above requirements.

If we have 100 300 GB 15 k sas drivers, we not only get 30 TB of storage, but also get 16800 iops.

If we use a slow disk in the previous example, we need 16800 such drives to reach 205 iops, this allows us to spend more money than using a fast disk ($20,000 vs $20,500). It sounds ironic, isn't it?

 

Necessity of RAID

Now, we need a bunch of disks to meet our speed or capacity requirements, so we need a mechanism to add the workload to multiple disks. The main means to achieve this is raid.
Raid stands for "Redundant Array <of> Inexpensive Disks, is this because the word inexpensive hinders them from charging more money ?), Raid provides a way to connect a bunch of disks to make the logic one.

Based on how you connect your disks, raid can provide Fault Tolerance-data will not be lost when one disk in the disk array crashes.

In addition, because multiple disks are connected in series, we can eliminate the iops limit for a single disk. More disks mean more iops, which is so simple.

 

Raid level

Raid only serves two purposes: 1) improve performance by improving iops; 2) fault tolerance. Higher Fault Tolerance means lower disk performance. Similarly, high-performance solutions also reduce fault tolerance.

Configuring raid based on fault tolerance and performance is a so-called raid level. Raid is a type of split for commonly used raid arrays. The common RAID levels include RAID 0, RAID 1, RAID 5, RAID 1 + 0, and RAID 0 + 1.

Based on your choice of the raid level, you need to pay the so-called "Raid price ". For some RAID levels, data must be repeatedly written twice to ensure fault tolerance, but this sacrifices performance. In addition, because repeated Data Writing requires more disk space. The raid cost will greatly increase the cost of your raid solution.

To understand the impact and benefits of raid on your system, it is important to be familiar with common RAID levels and their implementation principles.

 

RAID 0

The first and most basic raid level is RAID 0. RAID 0, which emphasizes writing data to the disk array to solve the IO restrictions. If I/O wants to write MB of data, raid0 will write MB of data to each disk of the disk array.

This method greatly reduces the load on each disk and reduces the rotation latency (each disk does not need to be transferred to the same number of circles as before to meet the request ).

Although raid0 greatly improves Io performance, it does not provide any fault tolerance measures, which means that if a disk in the disk array crashes, all data in the entire disk array will be lost.

Raid0 does not provide any fault tolerance measures, so raid0 is rarely used in the production environment.

It is worth noting that, since each disk in the raid0 disk array is used to store data, there is no disk space loss, such as the use of raid0, 10 Gb disks will have 3 TB of available storage space, which means there is no loss of disk space raid cost.

 

Raid 1

Raid1 is also called an "image" because it uses an image disk to ensure fault tolerance. Each disk in the image set has an image disk, and each data written to raid 1 is written in one copy of each disk. This means that in addition to any problem, the other disk will be topped. From the user's perspective, I don't know that the disk crashes.

Raid 1 requires the write performance. Each write IOP operation needs to be performed twice, but reading performance is improved because the RAID Controller reads a large number of data requests from two disks.

 

RAID 5

RAID 5 is also known as "Striping with parity)". This method can improve performance either by partitioning disks (striping raid0) or by parity (parity, when a disk crashes, you can use the computing function to reconstruct the lost data.

Although parity is a good way to implement fault tolerance. However, it is expensive to write data to a disk. That is to say, RAID5 requires four iops for each IOP write request.

The reason for the high write cost is as follows:

    • Read raw data (1 IoP)
    • Read the current parity data (1 IoP)
    • Compare current data and new write requests
    • Calculate New parity value based on data difference
    • Write new data (1 IoP)
    • Write a new parity value (1 IoP)

Raid 1 + 0 RAID 1 + 0 integrates RAID 0 (Disk Division) and RAID 1 (image) as shown in its name ). This method is also called splitting an image.

Raid 1 + 0 since data is divided into multiple disks, and does not have the same parity code as RAID 5, the write speed is very fast.

However, the write speed is still affected because the image disk needs to be written repeatedly, but the write speed is still very fast.

The cost of RAID 1 + 0 storage is equivalent to raid1 (image). In RAID 1 + 0, only half of the disk space can be used to store data.

 

RAID 0 + 1

RAID 0 + 1 is similar to raid 1 + 0. They are all achieved through disk splitting and mirroring. Their differences are more academic. Here we assume they are the same.

The cost of RAID 0 + 1 is the same as that of RAID 1 + 0.

 

Other RAID levels (2, 3, 4, 6, DP, etc)

There are also some other uncommon non-standard RAID levels. Raid 2, 3, 4, 6 and raid DP are similar to RAID 5. They both provide performance and fault tolerance through Segmentation and some parity checks. The difference between these RAID levels similar to RAID 5 is only how they write parity data. Some of them store the parity data by retaining a disk, and some others distribute the parity data to multiple disks. If necessary, you can do this research, but for me, I call them "RAID 5"

Another non-standard raid level is raid DP, which is short for "dual parity", which is similar to RAID 5 but writes parity data twice, this is costly for writing, and the write cost is increased to 6 (each Io write request requires 6 iops)

 

Raid comparison

Selecting an appropriate raid level is not easy. You need to consider the following factors: cost, performance, and capacity.

The following table summarizes the benefits and disadvantages of each standard raid level.

Raid level Fault Tolerance Read Performance Write Performance Raid write penalty Cost
0 None Good Excellent 1 Excellent
1 Good Good Good 2 Fair
5 Fair Good Poor 4 Good
1 + 0 Excellent Excellent Excellent 2 Poor
DP Good Good Terrible 6 Good

 

SQL storage recommendations

 

SQL Server File Raid level
Operating System and SQL binary file Raid 1
Data and Indexing Raid 1 + 0 (RAID 5 can be used if the budget is not allowed)
Logs Raid 1 + 0
Tempdb Raid 1 + 0
Backup RAID 5

 

Other considerations

When you need to plan your Io subsystem and SQL file distribution and raid level, you need to consider other factors.

 

RAID Controller

Raid can be implemented in two ways: Software Implementation and hardware implementation.

In the software RAID configuration, the operating system manages the raid level and the IO load between multiple disks.

In the hardware RAID configuration, a hardware is physically used as the RAID Controller.

Generally, the hard disk raid solution is more robust, flexible, and powerful. Based on your budget for the RAID Controller, you can get the configuration options for the corresponding budget.

For example, some raid controllers only provide a raid level (such as RAID 5), and some more expensive raid controllers provide cache functions. The cache can be used to cache read operations, write operations, and both. A better RAID Controller even provides the option of allocating cache for reading and writing.

Caching is very important for SQL Server, especially for writing. No matter what raid level, there is no read performance cost. All RAID levels increase the speed of Data Reading. Writing is the cost of RAID. You can cache all write operations through raid cache, which greatly improves the write performance. In general, cache raid controllers all carry batteries, so that the cached data will not be lost even if the power is down.

Remember, SQL Server itself is very good at caching and reading, so it doesn't make sense to use the cache in expensive raid controllers to cache and read data.

 

Virtualization

Another factor worth considering is virtualization. Whether you like it or not, we have entered the virtualization world. SQL Server instances deployed in the production environment in VMware become more and more common.

Virtualization also affects raid and I/O. Based on the virtualization products you use, when you choose the raid level, you need to consider more factors, such as how the VM interacts with the storage system.

Summary

Obviously, we still have some information that we haven't discussed. Raid is important to SQL Server performance and fault tolerance.

I hope this article will help you understand how raid affects the performance of your SQL Server. As a DBA or database architect, you must understand the performance and fault tolerance of the current RAID configuration.

 

References

Msdn:
Http://msdn.microsoft.com/en-us/library/ms190764.aspx

Technet:
Http://technet.microsoft.com/en-us/library/cc966534.aspx

Ed Whalen-perftuning.com
Http://www.perftuning.com/files/pdf/RAID1.pdf

 

Link: http://www.sqlservercentral.com/articles/RAID/88945/

Translated by careyson

Note: Some people have pointed out in the forum that some errors may occur in the text. Please refer to the link: view the discussion about this article. however, the article provides a systematic description of raid.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.