RAID level features, refer to Wikipedia.
Redundant array of independent hard disks (RAID, Redundant array of independent Disks), formerly redundant Array of Inexpensive disks (Redundant array of inexpensive Disks), referred to as disk arrays. The basic idea is to combine a number of relatively inexpensive hard drives into a hard disk array to achieve even more expensive, large-capacity drives. Depending on the version selected, RAID has the benefit of one or more of the following aspects than a single hard drive: Enhanced data integration, enhanced fault tolerance, increased throughput or capacity. In addition, the disk array looks like a separate hard disk or logical storage unit for a computer. Divided into raid-0,raid-1,raid-1e,raid-5,raid-6,raid-7,raid-10,raid-50,raid-60.
In short, RAID combines multiple hard disks into a single logical sector, so the operating system only treats it as a hard disk. RAID is often used on server computers and is often combined using exactly the same hard drive. Due to the declining cost of hard drives and the more efficient integration of RAID features with the motherboard, it is also a choice for players, especially those that require large volumes of storage, such as video and audio production.
The initial raid was divided into different levels, each with its theoretical advantages and disadvantages, and the different levels were balanced between two targets, increasing data reliability and increasing memory (group) Read and write performance. Over the years, there have been different applications for RAID concepts.
Standard raid
RAID 0
RAID 0 is also known as a stripe set. It makes more than two disks in parallel and becomes a large-capacity disk. When storing data, fragmentation is stored on these disks, since both reads and writes can be processed in parallel, so at all levels, the speed of RAID 0 is the fastest. But RAID 0 is neither redundant nor fault-tolerant, and if one disk (physical) is damaged, all data is lost and the risk level is comparable to JBOD.
RAID 1
More than two groups of n disks mirror each other, in some multi-threaded operating system can have a good reading speed, the theoretical reading speed is equal to the number of hard disk multiples, the other write speed has a small decrease. As long as a disk is normal to maintain operation, the highest reliability. The principle is that the data is stored on the primary hard drive and the same data is written on the mirrored drive. When the primary hard disk (physical) is damaged, the mirrored hard disk replaces the work of the primary hard disk. Data security for RAID 1 is best at all RAID levels because there is a mirrored hard disk for data backup. However, no matter how many disks are used for RAID 1, the capacity of only one disk is the lowest level of disk utilization in all raid.
If you use two different size disks to build RAID 1, the free space for the smaller disk, the larger disk space can also be partitioned into a zone to use, will not cause waste.
RAID 2
This is a modified version of RAID 0, where the data is encoded in Hamming code (Hamming code) and partitioned into separate bits, and the data is written to the hard disk separately. Because the error correction code (ecc,error Correction code) is added to the data, the overall capacity of the data is larger than the original data, and the RAID2 can operate at least three disk drives.
RAID 3
Using the bit-interleaving (Data interleaved storage) technology, it needs to be encoded and then split the data bit after the hard disk, and the same bit after the inspection of a separate hard disk, but because the data within the bits scattered on different hard disks, So even if you want to read a small piece of data, all the hard drives may be required to work, so this specification is more suitable for reading large amounts of data.
RAID 4
It is different from RAID 3 is that it is in the partition when the disk is in chunks, but each time the data access must be checked from the same hard drive in the year-on-year special data to check, due to too frequent use, so the loss of the hard disk may improve. (Block interleaving technology, blocks interleaving)
RAID 5
RAID Level 5 is a storage solution that combines storage performance, data security, and storage costs. It uses disk Striping (hard disk partitioning) technology. RAID 5 requires at least three hard drives, RAID 5 does not back up the stored data, but instead stores the data and the corresponding parity information on each disk that makes up the RAID5, and the parity information and the corresponding data are stored on separate disks. When one of RAID5 's disk data is damaged, it can recover the corrupted data using the remaining data and the corresponding parity information. RAID 5 can be understood as a compromise between RAID 0 and RAID 1. RAID 5 provides data security for the system, but with a lower level of protection than mirroring and higher disk space utilization than mirroring. RAID 5 has a similar data read speed as RAID 0, just because a bit more parity information, the speed of writing data is relatively slow to write to a single hard disk, if the use of "write-back cache" can improve performance. And because multiple data corresponds to one parity message, RAID 5 has a higher disk space utilization than RAID 1 and is relatively inexpensive to store.
RAID 6
RAID 6 adds a second independent parity information block compared to RAID 5. Two independent parity systems use different algorithms and the reliability of the data is very high, and any two disks fail at the same time without compromising data integrity. RAID 6 requires more disk space to be assigned to parity information and additional checksum calculations, with more IO operations and computations compared to RAID 5, and its "write performance" strongly depends on the implementation, so RAID6 is not typically implemented by software, but is more likely to be implemented in hardware/firmware mode.
A maximum of two disks in the same array are allowed to be corrupted. After the new disk is replaced, the data is recalculated and written to the new disk. According to design theory, RAID 6 must have more than four disks to take effect.
The usable capacity is the difference between the total number of drives minus 2, multiplied by the minimum capacity, the formula is:
In the same vein, the data Protection area capacity is the minimum capacity multiplied by 2.
RAID 6 is the most common type of disk array in the functionality of the hardware disk array card.
Implement
Storage Networking Industry Association (SNIA) for RAID 6 is defined as: "In the case of simultaneous failure of any two disks, it is still possible to perform a RAID implementation such as write operations on all virtual disks in the raid." To date, there are several methods (parity and Reed-Solomon), two-check, orthogonal double parity, and diagonal parity, to implement RAID 6. "
In order to tolerate the failure of any two disks, two different synthetic decoding needs to be computed. One of them is P, which can be obtained by simple XOR or computation like RAID 5, while the other complex coding is more complicated and needs to be solved by domain theory.
To solve this problem, we need to introduce a Galois domain for a suitable k-th irreducible polynomial. A piece of data can be written in binary form, which is 0 or 1, corresponding to the elements in the Galois field. The data in the corresponding disk stripe is encoded in this way into a domain element (which may actually be cut into chunks of byte size). If it is a generator of a domain and represents an addition in a field, and a parallel representation of multiplication in a domain, then the sum of the calculations can be expressed as (representing the number of the data disk):
For people with background in computer science, it is easier to understand that the method is to think of as an XOR operation, as the result of a linear feedback shift register operating on a piece of data. Thus the calculation of p in the above formula [2] is the XOR of each stripe. This is because for any second-order finite field, addition is actually an XOR. The calculation of Q is an XOR of the results of the shift operation of each stripe.
If a piece of data disk fails, the data can be recalculated like RAID 5. If two data disks or a piece of data disk and a disk containing p are invalidated, the data can be recalculated through a more complex process via p and Q (or just via Q), and the computational process needs to use domain theory, which is very complex. If and () Two pieces of data disk are lost, the other information used can be obtained and, rather, and:
The two ends of the equation are multiplied and added to the previous equation to get it, and you can find out further.
The calculation of Q is more CPU-intensive than the calculation P. Therefore, the software implementation of RAID 6 for system performance will have a significant impact, and the hardware solution is relatively complex.
Hybrid RAIDJBOD
JBOD (Just a Bunch of Disks) on the classification, JBOD is not a raid level. There are two types of mainstream practice in the market because there is no norm.
- Using separate link ports, such as SATA, USB, or 1394, to control multiple separate drives simultaneously, this mode is typically a higher-level device, with RAID capability, and does not rely on JBOD to achieve the purpose of merging logical sectors.
- Just combine multiple hard disk space into one large logical hard disk without the error redundancy mechanism.
The storage mechanism of the data is stored by the first hard disk, that is, the operating system sees a large hard disk (consisting of many small hard disks). However, if the hard drive is damaged, all data on the hard drive will not be saved back. If the first hard drive is damaged, it is usually not possible to rescue (because most file systems have a disk partition table (partition tables) in front of the disk, that is, the first one), loss of disk partition table is lost all data, if encountering disk array data or hard disk error condition, the risk is more than RAID 0. It has the advantage of not being like raid, and reading and writing all the hard drives every time you access it.
RAID 7
RAID 7 is not a publicly available RAID standard, but rather a patented hardware product name for storage Computer Corporation, and RAID 7 is based on RAID 3 and RAID 4, but is hardened to address some of the original limitations. In addition, the use of a large number of caches in the implementation and dedicated instant processors for asynchronous array management allows RAID 7 to handle a large number of IO requirements at the same time, so performance even exceeds many other raid standard implementations. But also because of this, in terms of price is very high. [3]
RAID 10/01
RAID 10 is the first mirror-and-partition data, then divides all the hard drives into two groups, considered as the lowest combination of RAID 0, and then the two groups respectively as RAID 1 operation.
Raid 01 is the reverse of a RAID 10 program, which partitions and then mirrors the data into two groups of hard disks. It divides all the hard drives into two groups, becoming the lowest combination of RAID 1, while the two groups of hard disks are considered RAID 0 respectively.
When RAID 10 has one hard drive damaged, the remaining drives will continue to function. Raid 01 only needs one hard drive to be damaged, all hard drives in the same group of RAID 0 will stop working, leaving only the other groups ' hard drives running with less reliability. If you build raid 01 with six hard disks, and then use three RAID 0 for the mirror, then one hard drive will have three hard drives offline. As a result, RAID 10 is much more common than raid 01, and most retail motherboards support RAID 0/1/5/10, but raid 01 is not supported.
RAIDRAID 50
RAID 5 and RAID 0 combination, first RAID 5, then raid 0, that is, multiple groups of RAID 5 constitute stripe access to each other. Since RAID 50 is based on RAID 5 and RAID 5 requires at least 3 hard disks, RAID 5 is composed of multiple sets of RAID 50 and requires at least 6 disks. In the case of RAID 50, the smallest 6-disk configuration, the first 6 disks are divided into 2 groups, each group of 3 are composed of RAID 5, so that two groups of RAID 5, and then the two groups of RAID 5 constitute RAID 0.
RAID 50 remains operational when 1 hard drives are damaged in any of the underlying groups or groups of RAID 5, but the entire group of RAID 2 will fail if 2 or more drives are damaged in any set of RAID 5.
RAID 50 has a lower capacity utilization than RAID5 because of the stripe of RAID 5 in the upper layer, which is higher than the pure RAID 5. For example, the same use of 9 hard disks, 3 RAID 5 is composed of RAID 0 raid 50, each group of RAID 5 waste a hard disk, utilization rate is (1-3/9), RAID 5 is (1-1/9).
RAID
It has a mirrored stripe array, and one of the bands in the hard drive is a RAID 3 hard disk array consisting of more than 3 groups of RAID 5.
RAIDRAID 60
Combination of RAID 6 and RAID 0: RAID 6 first, then RAID 0. In other words, stripe access to more than two sets of RAID 6. RAID 6 requires at least 4 drives, so the minimum requirement for RAID 60 is 8 drives.
Since the bottom layer is composed of RAID 6, RAID 60 can allow up to 2 disks to be destroyed in any set of RAID 6, while the system remains operational, but as long as 6 hard disks are destroyed in any of the underlying set of RAID 3, the entire group of RAID 60 will fail, although the probability of this is fairly low.
The performance is higher than that of the simple RAID 6,raid 60, which is composed of stripe access combined with multiple sets of RAID 6. However, the use of high threshold, and low capacity utilization is a big problem.
Application
RAID2, 3, 4 less practical application, because RAID5 already covered the required functions, so RAID2, 3, 4 are mostly only in the research field has been realized, and the actual application of RAID5-based.
RAID4 is used on some commercial machines, such as a NAS system designed by NetApp, that uses the RAID4 design concept.
disk array Comparison table
RAID Level |
Minimum hard drive |
Maximum fault tolerance |
Usable capacity |
Read Performance |
Write Performance |
Security |
Purpose |
Application Industry |
Single Drive |
Reference |
0 |
1 |
1 |
1 |
No |
|
|
JBOD |
1 |
0 |
N |
1 |
1 |
None (with raid 0) |
Increase capacity |
Personal (temporary) storage backup |
0 |
2 |
0 |
N |
N |
N |
One hard drive exception, all hard drives will be abnormal |
The pursuit of maximum capacity, speed |
Video Splicing Cache usage |
1 |
2 |
N-1 |
1 |
N |
1 |
The highest, a normal can |
The pursuit of maximum security |
Personal, Enterprise backup |
5 |
3 |
1 |
N-1 |
N-1 |
N-1 |
High |
The pursuit of maximum capacity, minimum budget |
Personal, Enterprise backup |
6 |
4 |
2 |
N-2 |
N-2 |
N-2 |
Security is higher than RAID 5 |
With RAID 5, but more secure |
Personal, Enterprise backup |
10 |
4 |
N/2 |
N/2 |
N |
N/2 |
High safety |
Comprehensive raid 0/1 advantages, faster theoretical speed |
Large databases, servers |
1. n represents the total number of drives
2. Jbod can be connected to an existing hard drive to increase capacity directly
type
According to the implementation mode, the software and hardware are divided into two kinds:
-
Software disk array (software RAID)
-
Mainly by the computer motherboard CPU processing array storage jobs, the disadvantage of wasting more CPU resources operation RAID, the advantage is the low price. There are two types of classification:
- Software-only disk array (pure software RAID): only need motherboard support, no disk array card required. If the motherboard is damaged, it may be difficult to purchase the same motherboard to rebuild the raid.
- Hardware-assisted disk array (hardware-assisted RAID): A RAID card is required, as well as the driver provided by the manufacturer. This raid is easier to migrate to other computers.
-
Hardware disk array (Hardware RAID)
The
-
processor is built into the raid card and does not require CPU operation on the server. The advantage is the fastest read and write performance, does not occupy server resources, can be used for any operating system, but also after the system power off, through the backup battery module (BBU, Backup Battery Unit) and nonvolatile memory (NVRAM) Read and write log files (Journal) contains the remaining read and write jobs are recorded in the memory, waiting for the power supply to be revoked, then the NVRAM retrieve the log file data, and then complete the read and write operations, the remaining read and write operations to ensure the integrity of read and write. The backup battery module typically works with the Write-back cache mode of the array card to read and write jobs for higher read and write performance, but does not have a hardware disk array card to back up the battery module, and do not use the Write-back cache mode to avoid loss of read and write data in case of a power outage. In addition, because the hardware disk array card with CPU processor, it can be separated from the system, the hard disk for a variety of jobs, restore jobs faster than the software disk array. The disadvantage is that it sells at a high price and is typically used only for RAID 5 and RAID 6.
disk array-related customer types
- General consumer backup data, important data backup when an enterprise creates an ERP system or NAS system.
- Multimedia digital content creation company, personal audio and video clip digital content Studio.
- Digital Surveillance System (DVR), Network Monitoring System (NVR) and so on need to store a large number of video surveillance system industry, military, casino because of the need for a large number of monitoring system is also a common use of disk arrays of customers.
- Securities, banks and other financial industries to keep important customer data.
RAID level features