Raid comparison: performance, running speed, and read/write

Source: Internet
Author: User

In terms of overall performance (data security and speed), RAID5 is certainly better; in terms of Data Reading, raid1 is the fastest; in terms of data security, raid1 is the best; in terms of data writing, raid0 is the fastest. RAID5 has both these advantages.

 

Understanding disk array raid

I. Functions
1 high-speed disk access (acceleration): Raid forms a disk array of General hard disks and writes data to the host. The RAID Controller splits the data to be written to the host into multiple data blocks, then, the data is written to the disk array in parallel. When the Host reads data, the RAID Controller concurrently reads data distributed on each hard disk in the disk array and re-assembles the data to the host. Parallel read/write operations improve the access speed of the storage system.

2 resizing

3. data redundancy

Ii. Classification

Raid can be divided into levels 0 to 6, which are generally called RAID 0, raid1, raid2, raid3, raid4, RAID5, and raid6.

Raid0: raid0 is not a real raid structure and has no data redundancy. raid0 continuously splits data and reads/writes it on multiple disks in parallel. Therefore, it has a high data transmission rate, but raid0 does not provide data reliability while improving performance. If a disk fails, the entire data will be affected. Therefore, raid0 cannot be used in key applications that require high data availability.

Raid1: raid1 achieves data redundancy through data mirroring, and generates mutually backed up data on two pairs of separated disks. Raid1 can improve reading performance. When raw data is busy, you can directly read data from the image. Raid1 is the most expensive in the disk array, but provides the highest data availability. When a disk fails, the system can automatically switch to the image disk without restructuring the invalid data.

Raid2: in terms of concept, raid2 is similar to raid3. Both of them deploy data blocks on different hard disks, in bytes or bits. However, raid2 uses the encoding technology known as "increasing the average error correction code" to provide error detection and recovery. This encoding technology requires multiple disks to store inspection and recovery information, making the implementation of raid2 more complex. Therefore, it is rarely used in commercial environments.

Raid3: Unlike raid2, raid3 uses a single disk to store parity information. If a disk becomes invalid, data can be regenerated on the parity disk and other data disks. If the parity disk is invalid, data usage is not affected. Raid3 provides a good transmission rate for a large amount of continuous data, but for random data, the parity disk will become the bottleneck of write operations.

Raid4: Like raid2 and raid3, raid4 and RAID5 also block and distribute data on different disks, but the unit of the data block is block or record. Raid4 uses a disk as the parity disk. Each write operation requires access to the parity disk, which becomes the bottleneck of write operations. It is rarely used in commercial applications.

RAID5: RAID5 does not separately specify a parity disk. Instead, it accesses data and parity information on all disks. On RAID5, read/write pointers can be performed on the array devices at the same time, providing higher data traffic. RAID5 is more suitable for small data blocks and random read/write data. An important difference between raid3 and RAID5 is that each data transmission of raid3 involves all array disks. For RAID5, most data transmission only operates on one disk and can be performed in parallel. There is a "Write loss" in RAID5, that is, each write operation will generate four actual read/write operations, two of which read the old data and parity information, write new data and parity information twice.

Raid6: Compared with RAID5, raid6 adds the second independent parity information block. Two independent parity systems use different algorithms to ensure high data reliability. Even if the two disks are invalid at the same time, data usage will not be affected. However, you need to allocate more disk space to the parity check information, which has a greater "Write loss" compared with RAID 5 ". The Write Performance of RAID 6 is very poor. The poor performance and complicated implementation make raid 6 seldom used.

Iii. Details

Raid0 is intended for speed-up and resizing

In raid0 mode, data is divided into a certain number of chunks and written on multiple hard disks, generally, the number of data partitions in the RAID 0 system is related to the number of hard disks used by the raid array. For example, three hard disks are used in RAID 0, data will be written into three hard disks in three copies. In other words, this mode uses RAID technology to make the system think that three hard disks constitute a larger hard disk, this raid mode is the fastest way to read and write because there is no data validation in this process.

Raid0 does not take security into consideration. In fact, if a hard disk in raid0 breaks down, all data will be damaged and there is no way to recover it. This makes the security performance of raid0 very poor, so many users do not use the raid0 mode for security reasons. Even so, raid0 is the fastest mode among all RAID modes. If there are two hard disks in raid0 mode, the speed of reading data from raid0 storage is double that of a single hard disk ., If six hard disks are used, the theoretical speed is six times that of a single hard disk. If you use different hard disks in raid0 mode, there will be two problems. First, the valid hard disk capacity of raid0 will be the minimum hard disk capacity multiplied by the number of hard disks, this is because raid0 still distributes the files evenly to each hard disk after the smallest hard disk is fully occupied. In this case, the storage task cannot be completed, if the hard disk speed in raid0 is different, the overall speed will be the slowest speed of the hard disk multiplied by the number of hard disks, this is because the raid0 mode requires that the previous storage task be completed before proceeding to the next step. In this way, other fast hard disks will stop and wait for the slow hard disk to complete the storage or read tasks, this reduces the overall performance. Therefore, it is recommended that users of the raid0 mode select hard disks with the same capacity and speed, preferably the same product of the same brand.

Therefore, raid0 is strictly not a redundant independent disk array ". The raid0 mode is generally used when data needs to be processed quickly but has low security requirements on data. This raid Mode features simplicity and does not require complex and expensive controllers. The RAID 0 mode requires at least two hard disks, and the final storage capacity is the sum of the two hard disks.

Random read performance of raid0: Good
Random Write Performance of raid0: Good
Continuous reading performance of raid0: Good
Continuous Write Performance of raid0: Good

Raid0 advantages: the fastest read/write performance, if each hard disk has an independent controller performance will be better.

Disadvantages of raid0: if any hard disk fails, all data will be lost. Most controllers are implemented through software, so the efficiency is not good.

Raid1

The raid 1 mode allows hard disks in the raid 1 mode to mirror each other. When you write data to the hard disk, the two hard disks store the same data at the same time, so that even if one of the hard disks fails, the system can run normally using another hard disk. Compared with a single hard disk, raid1 provides better data reading performance, because when a hard disk is busy, the raid controller can read the same data from another hard disk, however, the write performance does not increase, and may be slightly degraded. When one of the hard disks fails, new data can be written to a hard disk that still works normally. After the new hard disk is replaced with the original hard disk, the RAID Controller automatically copies data to the new hard disk. The biggest feature of raid1 mode is its high redundancy. However, because most of the features are implemented by software, it will increase the burden on the processor. This raid mode is suitable for those who have extremely high requirements on data security.

In raid1 mode, it is best to use the same hard disk; otherwise, disk space will be wasted. Because raid1 mode writes the same information to different hard disks, the effective hard disk capacity in raid1 mode is the capacity of the hard disk with the smallest capacity in the array. For example, if there is a 20 GB hard disk and a 30 GB hard disk in raid1 mode, the total effective capacity of raid1 is 20 GB, from then on, the remaining 10 Gb capacity on the 30 GB hard drive will be wasted. At the same time, if the speed of the two hard disks is different, the fast disk will still stop waiting for the slow hard disk to complete the task and then proceed to the next step.

Random read performance of raid1: Good
Random Write Performance of raid1: Good
Continuous reading performance of raid1: Average
Continuous Write Performance of raid1: Good

Advantages of raid1: high data reliability, easy implementation, and simple design.

Disadvantages of raid1: It is slower than raid0, especially the write speed. In addition, we can only use half of the hard disk capacity.

Raid0 + 1

This raid mode is actually a combination of RAID 0 and RAID 1, and requires at least four hard disks. Any two of them form a RAID 0 disk array, and then two RAID 0 disk arrays can be considered as two larger and faster hard disks. They form a RAID 1 disk array. This system ensures high disk performance and high data security. Of course, the disadvantage is that the cost is high and the structure is complicated. Raid0 + 1 is second only to RAID5 in terms of fault tolerance, and is generally used in file servers.

Random read performance of raid0 + 1: Good
Random Write Performance of raid0 + 1: Good
Continuous reading performance of raid0 + 1: Good
Continuous Write Performance of raid0 + 1: Good

Raid0 + 1 advantages: it has higher read/write performance than a single hard disk, and greatly improves data security.

Disadvantages of raid0 + 1: high cost, at least 4 hard disks required.

Raid2

The raid 2 mode is also quite complex. The hard disks used to store data are combined in the RAID 0 mode, and the hard disks used to store the Haiming ECC verification code are added. Of course, to improve the security of the verification code data, the checkcode hard drive consists of at least two RAID 1 modes. In this way, even if one of the hard disks storing data is damaged, the raid controller can use the Hamming code to restore the data to the new hard disk. Raid2 is generally applicable to large data volume operations and supercomputer applications, but it is not suitable for common users. Because the verification code is generated during data storage, the performance of this disk array is not high. For various reasons, this disk array mode has not been invested in practical commercial applications. Because of the high price, of course, it will not be accepted by ordinary users.

Random read performance of raid2: Average
Random Write Performance of raid2: poor, mainly because all operations must undergo ECC operations
Continuous reading performance of raid2: Good
Continuous Write Performance of raid2: Average

Raid2 advantages: high data security, as long as the hard disk storing the verification code is not faulty, data can be restored.

Disadvantages of raid2: Expensive, requires dedicated hard disks to store verification codes, low efficiency, and no commercial application support.

Raid3

Like raid2, raid3 data is also divided into data blocks and stored on multiple hard disks in sequence. Only raid3 splits data in BIT units and stores the data on each hard disk. Its advantage is its high-speed read/write capability, of course, the write performance will be affected because the parity code needs to be generated during the write process-it also needs a dedicated hard disk to store the parity code. When one of the hard disks storing data fails, the system can still run normally, but the performance will be affected. If another hard disk fails before the hard disk is replaced, the data in the disk array will be lost and cannot be recovered. In this disk array mode, the rotation speed of all hard disks is required to be synchronized, which is difficult in practical applications. Raid3 requires at least three hard disks, one of which is used to store the parity code. The parity code is obtained through an exclusive or operation.

This raid mode, if implemented using a software controller, will significantly affect the performance, because this combination is complicated, however, compared with the raid0 + 1 mode, it can be achieved with at least three hard disks-so the cost is reduced. In general, this disk array is suitable for video processing and editing applications.

Random read performance of raid3: Good
Random Write Performance of raid3: Poor
Continuous reading performance of raid3: Good
Continuous Write Performance of raid3: Average

Raid3 advantages: it is suitable for video editing and other scenarios that require large data volumes.

Disadvantages of raid3: it is very difficult to synchronize the speed of each drive (most hard disks currently do not support this function) and complicated controllers are required.

Raid4

The raid4 mode is almost the same as raid3. Data is divided into small data blocks and stored on multiple hard disks in sequence. The parity code is stored on an independent parity disk. The only difference is that raid3 is measured in bits and raid4 is measured in bytes. In this way, raid4 has the same read speed as raid3. Of course, the write performance is affected because the verification code needs to be generated during the write process and stored on the verification disk.

The biggest advantage of this mode is that it does not require synchronization between hard disks at the rotation speed, which makes the Controller unnecessary. Its write performance is the worst among all RAID modes. In the same mode as raid3, when one hard disk is damaged, data will not be lost. If the faulty disk is replaced, the failure of the second hard disk will lead to the loss of all data. Compared with other raid modes, the efficiency of restoring data in a faulty hard disk is relatively low.

This disk array mode requires at least three hard disks. The parity code is obtained through an exclusive or operation. It is suitable for general applications, including video processing and other applications. The cost is not high, because only one hard disk can be used as a verification code disk.

Random read performance of raid4: Good
Random Write Performance of raid4: Generally, it is mainly because the verification code is written to the parity disk.
Continuous reading performance of raid4: Good
Continuous Write Performance of raid4: Average

Raid4 advantages: In addition to raid3, it does not need to synchronize the drive speed.

Disadvantages of raid4: Low write performance and high requirements on controllers.

RAID5

RAID5 uses at least three hard disks to implement the array. It can accelerate raid0 and implement the data backup function of raid1. when there are three hard disks in the array, it splits the required data into file fragments and stores them into two hard disks. At this time, the third hard disk in the array does not receive file fragments, it receives part of the data that is used to verify the data stored in the other two hard disks. This part of the verification data is generated by some algorithms, you can use this data to restore the data stored on the other two hard disks. In addition, the tasks of these three hard disks are not static, that is to say, in this storage, they may be hard drive 1 and hard drive 2 used to store separated file fragments, the next storage may be hard drive 2 and hard drive 3 to complete this task. It can be said that in each storage operation, the task of each hard disk is randomly assigned. However, it must be two hard disks used to store separated file fragments, and the other one is used to store verification information.

This verification information is generally obtained through RAID Controller operations. Generally, this information requires a separate chip on a RAID Controller to calculate and decide which disk to send this information. RAID5 also enables high-speed storage and reading of raid0 and data recovery of raid1. That is to say, in the above case, RAID5 can use three hard disks to double the speed of raid0 at the same time. It also implements the data backup function of raid1, and when a hard disk in RAID5 is damaged, adding a new hard disk can also restore data.

RAID5 is the most complex controller design among the raid modes we have introduced so far. RAID5 can be applied in most fields, such as multiple users and multi-task environments. Currently, many web servers and other Internet servers use this type of disk array. For example, the recently launched quantum snap server uses an external RAID 5 disk array. Parity usually occupies about 33% of the disk space. Therefore, for a RAID5 disk array with a total capacity of GB, the available space is about 80 GB. However, this disk array mode is not supported in raid controllers of General motherboard processes, such as abit.
KR7A-RAID motherboard only supports raid0, raid1, raid0 + 1. Of course, as long as the verification code is used, the write performance will be affected to a certain extent. Therefore, many disk array manufacturers have added the write cache to the disk array to improve the write performance.

The RAID5 mode is not all good. If the information on a hard disk in the array has changed, you need to recalculate the file split fragments and re-calculate the verification information, in this case, all three hard disks need to be called again. Similarly, if you want to use a RAID 5 array, you 'd better use hard disks with the same capacity and speed, the valid capacity in RAID5 mode is the capacity of the hard disk with the smallest capacity in the array multiplied by the number of hard disks in the array minus one. Here, the number of hard disks minus one because one hard disk is used to store verification information.

Random read performance of RAID5: Very good (when using big data blocks)
Random Write Performance of RAID5: average, but better than raid3 or raid4
Continuous reading performance of RAID5: Good (when using small data blocks)
Continuous Write Performance of RAID5: Average

RAID5 advantages: no special verification code disk is required, reading speed is fast, and the writing speed is relatively slow.

Disadvantages of RAID5: The write performance is still unsatisfactory.

Raid6

Raid 6 is a new technology in the raid family and is extended based on RAID 5. Therefore, like RAID5, data and verification codes are all divided into data blocks and then stored separately on each hard disk of the disk array. Raid 6 is added with an independent verification disk, which backs up the verification codes distributed on each disk. In this way, the raid 6 disk array allows multiple disks to be faulty at the same time, this is very necessary for applications with high data security requirements. In this way, a raid 6 disk array requires at least four hard disks. However, raid6 does not improve the Write Performance of RAID5. The write cache application can only make up for this shortcoming to some extent, but cannot fundamentally solve the problem. Because both RAID5 and raid6 can change the size of data blocks Based on the application, the actual performance of RAID5 is also affected by this factor.

In practical applications, raid6 is not widely used in other raid modes. To implement this function, you generally need to design a more complex and expensive RAID Controller, so it is generally not integrated into the motherboard.

Random read performance of raid6: Good (when using big data blocks)
Random Write Performance of raid6: poor, because not only do you need to write validation data on each hard disk, but also write data on a dedicated validation Hard Disk
Continuous reading performance of raid6: Good (when using small data blocks)
Continuous Write Performance of RAID 6: Average

Advantages of raid6: fast reading performance and higher fault tolerance capability.

Disadvantages of raid6: Low write speed. Raid controllers are more complex in design and cost-effective.

Hot Swap and hot Redundancy

Generally, raid systems have hot swap and hot redundancy capabilities. Hot Swap allows replacement of faulty Hard Disks without shutting down the system or power supply. Of course, the new hard disks can also be dynamically identified by the system and correctly configured and added, however, you do not need to restart the computer. The benefits of doing so are uncertain. It is very simple for maintenance personnel. For many application scenarios, such as web servers, users do not want server downtime, this will cause immeasurable losses. Many HP/Dell Server products and raid disk arrays have hot swapping capabilities.

Hot redundancy is generally used in scenarios where hot exchange is not suitable. This design usually requires an additional hard disk in the computer before a fault occurs. When a hard disk fails, this redundant disk can automatically replace the faulty hard disk location, for such a system, the damaged hard disk cannot be pulled out before the system is shut down. Although hot redundancy is not as convenient as heat exchange, it is better than hot redundancy.

Summary

In fact, there are many types of disk arrays. Today we will introduce some basic application modes. In order to achieve sufficient performance and stability, we can combine various RAID modes, of course, the requirements for raid controllers will be higher, and the cost of disk array systems will be higher.

The raid used by servers is generally based on SCSI, so the RAID system costs will be higher. In fact, this function is still a distance for our personal applications, even if you have a motherboard that integrates a RAID Controller, it also requires at least two hard disks (generally, these two hard disks are the same in terms of capacity, brand, and speed), which is not a small expense for individual users. Of course, if you have special requirements, such as assuming a workstation or web server, but don't want to spend too much money, IDE raid is a good choice. Here, we need to remind you that the CPU usage of the general on-board ide raid is high, and the IDE raid is inferior to that of the SCSI hard disk in some applications.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.