In this layout, data is written in block stripes to the first three Disks
(Disks 0, 1, and 2) while the third drive (Disk 3) is the parity drive.
Parity of the blocks ss the drives is computed by the RAID Controller and
Stored on the dedicated parity drive. In the figure the parity for A1, A2, and
A3 is listed as AP
On the parity drive.
The dedicated parity drive becomes a performance bottleneck in raid-4,
Participant ularly for write I/O. Since raid-4 has block level striping, you can write
To blocks A1 and B2 at the same time since they are on different disks. However,
The parity for both blocks has to be written to the same drive which can only
Accommodate a single write I/O request at a time. Consequently, one of
Parity writes (A1 parity or B2 parity) is blocked and the write I/O performance
Is already Ced. For more on the Performance of RAID-4, please see this link
.
The capacity of RAID-4 is the following:
Capacity = min(disk sizes) * (n-1)
Meaning that the capacity of RAID-4 is limited by the smallest disk (you can
Use different size drives in raid-4) multiplied by the number of drives
N
, Minus one. The "minus one" part is because of the dedicated parity
Drive. However, it is recommended you use drives that are the same size in
Raid-4.
Raid-4 improves on the redundancy of RAID-0, which has zero data redundancy,
By adding a parity disk. You can lose one drive without losing data. For example
You cocould lose the parity disk without losing data or you cocould lose one of
Data disks without losing data. But the introduction of the single dedicated
Parity drive has CED write performance relative to raid-0. However, if
Loss of Write Performance of RAID-4 is acceptable it does give you more data
Redundancy than RAID-0.
Raid-4 was the last raid configuration defined in the original raid paper. In
The real-world, raid-4 is rarely used because raid-5 (see next sub-section) has
Replaced it.
Table 4 below is a quick summary of RAID-4 with a few highlights.
Table 4-raid-4 highlights
| Raid level |
Pros |
Cons |
Storage Efficiency |
Minimum number of disks |
| Raid-4 |
- Good data redundancy/availability (can tolerate the lose of 1 drive)
- Good read performance since all of the drives are read at the same time
- Can lose one drive without losing data
|
- Single parity disk (causes bottleneck)
- Write performance is not that good because of the bottleneck of the parity
Drive
|
(N-1)/n WhereN Is the number Drives |
3 (have to be identical) |
Raid-5
Raid-5 is similar to raid-4 but now the parity is
Distributed distribute SS all of the drives instead of using a dedicated parity drive.
This greatly improves write performance relative to raid-4 since the parity is
Written on all of the drives in the raid-5 array. Figure 5 below from Wikipedia
(Image
Cburnett) illustrates how the data is written to four disks in raid-5.
Figure
5: Raid-5 layout (from cburnett at Wikipedia under the GFDL
License)
In this layout, the parity blocks are labeled with a subscript "p"
Indicate parity. Notice how they are distributed distribute SS all four drives.
Blocks that line up (one block per drive) are typically a "stripe". In Figure 5
The blocks in a stripe are all the same color. The data stripe size is simply
The following:
Data stripe size = block size * (n-1)
WhereN
Is the number of drives in the raid-5 array. Inside a stripe
There is a single parity block and all other blocks are data blocks. Anytime
Block inside the stripe is changed or written to, the parity block is recomputed
And rewritten (this is sometimes called the read-Modify-write process). This
Process can add overhead compaction cing performance.
Raid-5 also has some write performance problems for small writes that are
Smaller than a single stripe since the parity needs to be computed several times
Which eats up computational capability of the RAID Controller. As mentioned
Previusly the read-Modify-write process that must be followed happens much more
Often in this case.
The capacity of RAID-5 is very similar to raid-4 and is the following:
Capacity = min(disk sizes) * (n-1)
Meaning that the capacity of RAID-5 is limited by the smallest disk (you can
Use different size drives in raid-5) multiplied by the number of drives
N
, Minus one. The "minus one" part is because of the parity block per
Stripe.
With raid-5 you can lose a single drive and not lose data because either
Data or the parity for the missing blocks on the lost drive can be found on
Remaining drives. In addition, raid controllers allow what is called
Hot-Spare drive. This drive is typically part of the raid array but is initially
Not used for storing data. If the raid group loses a drive, the hot-spare is
Immediately brought into the raid group by the Controller.
In the case of RAID-5, the Controller immediately starts redistributing data
And parity blocks to this new drive. To do this, the initial drives in
Raid-5 array have to have all blocks read and the RAID Controller has
Recompute parity or rebuild missing data blocks. This combination means that it
Can take quite a bit of time to fail-over data to the hot-Spare drive. The nice
Thing about having a hot-Spare drives is that typically the fail-over process
Happens automatically so there is almost no delay in inconfigurating the hot-spare
Drive.
Raid-5 has been used for a very long time and during this time the data
Availability and redundancy has been very good. However, there is a new
Phenomenon that impacts raid-5 that has been explained in varous article around
The Web such as this one
. Basically
The capacity of drives is growing quicker than the unrecoverable read error
(Ure) Rate of drives to the point where losing a drive in a raid-5 array and
Recovering it to a hot-Spare drive is almost guaranteed to lead to a ure which
Means that the raid-5 array will be lost and the data has to be restored from
Backup. However, this is the subject for another article.
There is no privilege age of articles about raid-5 on the web. You will see some
Strong opinions both for and against raid-5 based on usage cases. Be sure
Understand the application used when reading about both pros and cons of RAID-5.
A reasonable overview of the trade-offs of RAID-5 is this article
.
Table 5 below is a quick summary of RAID-5 with a few highlights.
Table 5-raid-5 highlights
| Raid level |
Pros |
Cons |
Storage Efficiency |
Minimum number of disks |
| Raid-5 |
- Good data redundancy/availability (can tolerate the lose of 1 drive)
- Very good read performance since all of the drives can be read at the same
Time
- Write Performance is adequate (better than RAID-4)
- Can lose one drive without losing data
|
- Write Performance is adequate (better than RAID-4)
- Write Performance for small I/O is not good at all
|
(N-1)/n WhereN Is the number Drives |
3 (have to be identical) |
Raid-6
As mentioned previusly, there is a potential
Problem with raid-5 for larger capacity drives and a larger number of them.
Raid-6 attempts to help that situation by using two parity blocks per stripe
Instead of RAID-5's single parity block. This allows you to lose two drives
Losing any data. Figure 6 below from Wikipedia
(Image
Cburnett) illustrates how the data is written to four disks in raid-6.
Figure
6: Raid-6 layout (from cburnett at Wikipedia under the GFDL
License)
In this figure, the first parity block is noted with as subscript "P" such
AP
. The second parity block in a stripe is noted with a subscript "Q"
Such as AQ
. The use of two parity blocks CES The useable capacity
Of a raid-6 as in the following:
Capacity = min(disk sizes) * (n-2)
Meaning that the capacity of RAID-6 is limited by the smallest disk (you can
Use different size drives in raid-6) multiplied by the number of drives
N
, Minus two. The "minus two" part is because of the two parity Blocks
Per stripe.
Computing the first parity block, P, is done in the same fashion as RAID-5.
However, computing the Q parity block is more complicated as explained here
.
This means that the Write Performance of a raid-6 array can be slower than
Raid-5 array for a given level of RAID Controller performance. However, read
Performance from a raid-6 is just as fast as a raid-5 array since reading
Parity blocks is skipped. But in exchange for worse performance, raid-6 Arrays
Can tolerate the lose of two drives while raid-5 can only tolerate the lose of
Single Drive. Coupled with larger drives and larger drive counts, this means
That larger raid-6 Arrays can be constructed realtive to raid-5 arrays.
Table 6 below is a quick summary of RAID-6 with a few highlights.
Table 6-raid-6 highlights
| Raid level |
Pros |
Cons |
Storage Efficiency |
Minimum number of disks |
| Raid-6 |
- Excellent data redundancy/availability (can tolerate the lose of 2
Drives)
- Very good read performance since all of the drives can be read at the same
Time
- Can Lose two drives without losing data
|
- Write performance is not that good-worse than RAID-5
- Write Performance for small I/O is not good at all
- More computational horsepower is required for parity
Computations
|
(N-2)/n WhereN Is the number Drives |
4 (have to be identical) |
Hybrid RAID levels
As you can see, there are some limitations to each of the standard raid
Levels (0-6). Some of the them have great performance (RAID-0) but pretty awful
Data availability or redundancy while others have very good data availability
And redundancy (RAID-6) but the performance is not so hot. So as you can
Imagine, people started to wonder if they couldn't combine RAID levels
Combine features to perhaps achieve better performance while still having very
Good data redundancy and availability. This lead to what people called hybrid
RAID levels or what is more commonly called nested RAID levels
.
The topic of nested RAID levels is fairly lengthy so I will save that
Another article. But the basic concept is to combine RAID levels in some
Fashion. For example, a common configuration is called RAID 1 + 0 or raid-10.
First number (the furthest to the left) refers to the "bottom" or initial part
Of the raid array. Then the second number from the left refers to the "TOP"
Level or the raid array. The top level raid uses the bottom level raid
Deployments as building blocks.
In the case of RAID-10, the approach is to use multiple pairs of drives
The lowest level (RAID-1) and then to combine them using RIAD-0. This retains
The goodness of RAID-1 for data availability and redundancy while gaining back
Some performance from raid-0 striping.
Summary
This wraps our introduction to raid. For some people it may be new and
Keep it will be review. Now that we 've covered the basics, in coming articles we
Will be running ing Nested-raid more in depth, including raid-01 raid-5, raid-6
And raid-10 deployments.
Have questions about raid or topics you 'd liked to see covered? Post them in
The comments and we'll try to incoorporate them as deep dive into redundant
Arrays.