Principle of RAID disk array

Source: Internet
Author: User

Redundant Arrays of Independent Disks (RAID) indicates a disk array with low cost and redundancy capabilities. The principle is to use arrays as disk groups, and use them in combination with the Design of Distributed Data arrangement to improve data security. A disk array is a combination of many Inexpensive Disks into a disk group with a large capacity. The addition effect produced by the data provided by individual disks improves the efficiency of the entire disk system. This technology is used to cut data into multiple segments and store them on each hard disk. The disk array can also use the concept of parity check to read data when any hard disk in the array fails, after calculation, the data is re-inserted into the new hard disk.

 

There are three types of Disk Arrays: External disk array, internal disk array, and software simulation.
External disk array cabinets are most often used on large servers, with the Hot Swap feature, but these products are expensive.
The internal disk array card is suitable for technical staff because it is cheap but requires high installation technology.
Software Simulation is not suitable for servers with big data traffic because it will slow down the speed of the machine.

Raid0 has no redundancy function. If a disk (physical) is damaged, all data cannot be used.
The maximum usage of a raid 1 Disk can only reach 50% (when two disks are used), which is the lowest among all RAID levels.
RAID 0 + 1 is a compromise between RAID 0 and RAID 1. RAID 0 + 1 can provide data security protection for the system, but the level of protection is lower than that of mirror, and the disk space utilization is higher than that of mirror.
RAID 0: RAID 0 continuously splits data by bit or byte and reads/writes data on multiple disks in parallel. Therefore, RAID 0 has a high data transmission rate, but it has no data redundancy, therefore, it cannot be regarded as a real raid structure. RAID 0 only improves performance and does not guarantee data reliability. However, failure of a disk affects all data. Therefore, RAID 0 cannot be used in scenarios with high data security requirements.

 

Raid 1: it achieves data redundancy through disk data images, and generates mutually backed up data on paired Independent Disks. When raw data is busy, data can be directly copied from the image, so RAID 1 can improve read performance. Raid 1 is the most costly disk array, but provides high data security and availability. When a disk fails, the system can automatically switch to the image disk to read and write data without restructuring the invalid data.

 

Raid 01/10: Raid 10 and raid 01 are divided into RAID 0 and RAID 1 standards, when data is continuously divided by bit or byte and multiple disks are read/written in parallel, the disk image is redundant for each disk. Its advantage is that it has both the extraordinary speed of RAID 0 and the high data reliability of RAID 1, but the CPU usage is also higher, and the disk utilization is relatively low. Raid 1 + 0 is the first mirror and then partition data, and then all the hard disks are divided into two groups, is considered as the lowest combination of RAID 0, then the two groups are considered as RAID 1 operation. RAID 0 + 1 is the opposite of RAID 1 + 0 programs. It is to partition and then mirror the data to two hard disks. It divides all hard disks into two groups and becomes the lowest combination of RAID 1. The two hard disks are regarded as RAID 0. In terms of performance, RAID 0 + 1 has a faster read/write speed than RAID 1 + 0. Reliability: when one hard disk in Raid 1 + 0 is damaged, the other three hard disks will continue to operate. RAID 0 + 1 as long as one hard disk is damaged, the other hard disk of the same RAID 0 group will also stop operating, with only two hard disks operating, with low reliability. Therefore, raid 10 is far more commonly used than RAID 01. Most retail boards support RAID 0/1/5/10, but not raid 01.

 

Raid 2: Data is distributed in blocks on different hard disks, measured in bytes or bits, and the average error correction code is used) "encoding technology to provide error detection and recovery.
RAID 3: similar to raid 2, RAID 3 blocks data on different hard disks. The difference is that raid 3 uses simple parity, use a single disk to store the parity information. If a disk is invalid, data can be re-generated on the parity disk and other data disks. If the parity disk is invalid, data usage will not be affected. RAID 3 provides a good transfer rate for a large amount of continuous data, but for random data, the parity disk will become the bottleneck of write operations.

 

Raid 4: Raid 4 also blocks data and distributes the data on different disks. However, the disk unit is block or record. Raid 4 uses a disk as the parity disk. Each write operation requires access to the parity disk. In this case, the parity disk becomes the bottleneck for write operations. Therefore, raid 4 is rarely used in commercial environments.

RAID 5: RAID 5 does not separately specify a parity disk. Instead, it accesses data and parity information across all disks. On RAID 5, read/write pointers can be performed on the array devices at the same time, providing higher data traffic. RAID 5 is more suitable for small data blocks and random read/write data. The main difference between RAID 3 and RAID 5 is that each data transmission of RAID 3 involves all array disks. For RAID 5, most data transmission only operates on one disk and can be performed in parallel. There is a "Write loss" in RAID 5, that is, each write operation will generate four actual read/write operations, two of which read the old data and parity information, write new data and parity information twice.

 

Raid 6: Compared with RAID 5, raid 6 adds a second independent parity information block. Two independent parity systems use different algorithms, and the data reliability is very high. Even if the two disks are invalid at the same time, the data usage will not be affected. However, raid 6 needs to allocate more disk space to the parity check information, which has a greater "Write loss" compared with RAID 5. Therefore, the "Write Performance" is very poor. Poor performance and complex implementation methods make raid 6 rarely applied in practice.

Raid 7: this is a new raid standard. It comes with an intelligent real-time operating system and a software tool for storage management. It can run completely independently of the host and does not occupy host CPU resources. Raid 7 can be viewed as a storage computer, which is significantly different from other raid standards. In addition to the above standards (such as table 1), we can combine multiple raid specifications like RAID 0 + 1 to build the required raid array, such as RAID 5 + 3 (RAID 53) it is a widely used array. Generally, you can flexibly configure disk arrays to obtain a more suitable disk storage system.

Raid 5E (RAID 5 enhancement): Raid 5E is an improvement based on RAID 5. Similar to RAID 5, data verification information is evenly distributed across hard disks. However, A portion of unused space is retained on each hard disk, which is not striped. A maximum of two physical hard disks can be faulty. It seems that raid 5E is similar to RAID 5's hot spare disk. In fact, raid 5E distributes data across all hard disks, and the performance is better than RAID 5's hot spare disk. When a hard disk fails, the data on the faulty hard disk is compressed to space not used on other hard disks, and the logical disk is RAID 5.

Raid 5ee: Compared with raid 5E, the data distribution of RAID 5ee is more efficient, and a portion of the space of each hard disk is used as a distributed hot backup disk, which is part of the array, when a physical hard disk in the array fails, the Data Reconstruction speed is faster.

Raid 50: Raid 50 is a combination of RAID 5 and RAID 0. This configuration is used to strip data, including parity information, on each disk of the sub-disk group of RAID 5. Each RAID5 sub-disk group requires three hard disks. Raid50 is highly fault tolerant because it allows a disk in a group to fail without causing data loss. In addition, the reconstruction speed is greatly improved because the parity bit is located on the RAID5 sub-disk group. Advantage: higher fault tolerance capability and the potential for faster data reading speed. Note that disk failure affects throughput. The reconstruction information after the fault takes longer than the image configuration.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.