Transferred from: http://www.linuxidc.com/Linux/2015-08/122191.htm
RAID means a redundant array of inexpensive disks (redundant array of inexpensive Disks), but now it is known as a redundant array of independent disks (redundant array of independent Drives). Earlier a very small disk was very expensive, but now we can buy a larger disk cheaply. Raid is a collection of disks that are put together to become a logical volume.
Understanding RAID Settings in Linux
A RAID consists of a set or a collection or even an array. Use a set of disk-combined drives to compose a RAID array or RAID set. Connect at least two disks to a single RAID controller and become a logical volume, or you can place multiple drives in a single group. A set of disks can only use one RAID level. Using RAID can improve the performance of the server. Different RAID levels, performance will vary. It preserves our data through fault tolerance and high availability.
This series, named "Using RAID under Linux", is divided into 9 sections, including the following topics:
- Part 1th: Introduction to RAID levels and concepts
- Part 2nd: How to set RAID0 (striped) in Linux
- Part 3rd: How to set up RAID1 (mirroring) in Linux
- Part 4: How to set RAID5 (striped and distributed parity) in Linux
- Part 5th: How to set RAID6 (striped dual distributed parity) in Linux
- Part 6th: Set up RAID 10 or 1 + 0 (nested) in Linux
- Part 7th: Add an existing RAID array and delete the damaged disk
- Part 8th: Recovering (rebuilding) a damaged drive in RAID
- Part 9th: Managing RAID in Linux
This is the 1th part of the 9 Series of tutorials, where we'll cover the concept of RAID and RAID level, which you need to understand to build a raid on Linux.
Software RAID and hardware RAID
Software RAID performance is low because it uses the host's resources. The RAID software needs to be loaded to read data from the software RAID volume. Before the RAID software is loaded, the operating system needs to be booted to load the RAID software. No physical hardware is required in software RAID. 0 cost investment.
Hardware RAID performance is high. They use a PCI Express card to physically provide a dedicated RAID controller. It does not use host resources. They have NVRAM for the cached read and write. When the cache is used for RAID rebuilds, it uses the backed up battery power to keep the cache, even if a power failure occurs. For large-scale use is very expensive investment.
The hardware RAID card looks like this:
Hardware RAID
Important RAID Concepts
- The checksum is used to regenerate lost content in the RAID rebuild from the information saved by the checksum. RAID 5,raid 6 is based on checksums.
- Stripe is the random storage of slice data to multiple disks. It does not save the full data on a single disk. If we use 2 disks, each disk stores half of our data.
- The image is used for RAID 1 and RAID 10. Mirroring automatically backs up data. In RAID 1, it will save the same content to other disks.
- A hot backup is just a spare drive on our server that can automatically replace a failed drive. In our array, if any one of the drives is damaged, the hot backup drive is automatically used to rebuild the RAID.
- Block is the minimum unit of time that a RAID controller reads and writes data, with a minimum of 4KB. By defining the block size, we can increase I/O performance.
There are different levels of raid. Here, we only list the RAID levels that are most used in real-world environments.
- RAID0 = Striped
- RAID1 = Mirror
- RAID5 = single-disk distributed parity
- RAID6 = dual-disk distributed parity
- RAID10 = mirror + stripe. (Nested RAID)
RAID is managed on most Linux distributions using a package called Mdadm. Let's get to know each RAID level first.
RAID 0/Stripe
The stripe has good performance. In RAID 0 (striped), the data is written to disk using slices. Half of the content is placed on one disk, and the other half is written to another disk.
Suppose we have 2 disk drives, for example, if we write the data "Tecmint" to a logical volume, "T" will be saved in the first set, "E" will be saved in the second set, ' C ' will be saved in the first set, "M" will be saved in the second disk, and it will continue this cycle process. (LCTT: It is virtually impossible to slice by byte, which is sliced by data block.) )
In this case, if any one of the drives fails, we lose the data because only half of the data in a disk can be used to rebuild the RAID. However, when comparing write speed and performance, RAID 0 is very good. A minimum of 2 disks are required to create RAID 0 (striped). If your data is very valuable, then do not use this RAID level.
- Performance.
- Capacity 0 loss in RAID 0.
- 0 fault tolerance.
- Writing and reading have high performance.
RAID 1/Mirroring
Mirroring also has good performance. The image can make an identical copy of our data. Assuming we have two 2TB hard drives, we have 4TB in total, but in the mirror, but the drive behind the RAID controller forms a logical drive, we can only see that this logical drive has 2TB.
When we save the data, it will be written to the two 2TB drives at the same time. Creating RAID 1 (mirroring) requires a minimum of two drives. If a disk failure occurs, we can recover the RAID by replacing a new disk. If any one of the disks fails in RAID 1, we can get the same data from another disk because the same data is available on the other disk. So it is 0 data loss.
- Good performance.
- The total capacity loses half of the available space.
- Complete fault tolerance.
- Rebuilding will be faster.
- Slow write performance.
- Read performance is getting better.
- can be used for operating systems and small-scale databases.
RAID 5/Distributed parity
RAID 5 is used at the enterprise level. RAID 5 works in the form of distributed parity. Parity information is used to reconstruct the data. It rebuilds from the rest of the information on the normal drive. This can protect our data in the event of a drive failure.
Suppose we have 4 drives, and if one drive fails and we replace the failed drive, we can reconstruct the data from the parity to the replacement drive. Parity information is stored on all 4 drives if we have 4 1TB drives. The parity information is stored in 256G per drive, while the other 768GB is used by the user. RAID 5 still works after a single drive failure, which can cause data loss if the number of drives is more than 1 damaged.
- Excellent performance
- The reading speed will be very good.
- Write speed is on average, and if we don't use a hardware RAID controller, the write speed is slow.
- Rebuilds from the parity information of all drives.
- Complete fault tolerance.
- 1 disk space will be used for parity.
- Can be used in file servers, Web servers, and very important backups.
RAID 6 Dual distributed parity disk
RAID 6 is similar to RAID 5, but it has two distributed parity. are mostly used in large numbers of arrays. We need a minimum of 4 drives, and even if 2 drives fail, we can still rebuild the data after the new drive is changed.
It is slower than RAID 5 because it writes data to 4 drives at the same time. When we use the hardware RAID controller, the speed is at an average level. If we have 6 1TB drives, 4 drives will be used for data saving and 2 drives will be used for verification.
- Poor performance.
- Read the performance very well.
- If we do not use hardware RAID controller write performance will be poor.
- Rebuild from two parity drives.
- Complete fault tolerance.
- 2 disk space will be used for parity.
- Can be used with large arrays.
- Used for backup and video streaming, for large scale.
RAID 10/Mirror + stripe
RAID 10 can be referred to as 1 + 0 or 0 +1. It will do mirroring + stripe two jobs. In RAID 10, first make a mirror and then do a stripe. First make a stripe on RAID 01, then do the mirroring. RAID 10:01 is good.
Let's say we have 4 drives. When I write data on a logical volume, it saves the data to 4 drives using mirroring and stripe.
If I write the data "Tecmint" on RAID 10, the data will be saved using the following method. First write "T" to two disks at the same time, "E" will also write to the other two disks, all data are written to two disks. This allows you to copy each data to a different disk.
At the same time it will write data using RAID 0, followed by writing "T" to the first set of disks, "E" written to the second set of disks. Write "C" again to the first set of disks, "M" to the second set of disks.
- Good read and write performance.
- The total capacity loses half of the free space.
- Fault tolerant.
- Quickly rebuild from the replica data.
- Because of its high performance and high availability, it is often used in database storage.
Conclusion
In this article, we've learned what raid is and what level of RAID is used in the real world. I hope you have learned what is written above. For RAID construction, you must know the basics of RAID. The above content can basically meet your understanding of RAID.
In the next article, I'll show you how to set up and use various levels to create RAID, increase RAID groups (arrays), drive troubleshooting, and more.
How to build a RAID 10 array on Linux http://www.linuxidc.com/Linux/2014-12/110318.htm
Debian Soft RAID Installation Note-install RAID1 with Mdadm http://www.linuxidc.com/Linux/2013-06/86487.htm
Common RAID Technology introduction and sample demo (multi-image) http://www.linuxidc.com/Linux/2013-03/81481.htm
Linux implements the most commonly used disk array--RAID5 http://www.linuxidc.com/Linux/2013-01/77880.htm
Performance test results for raid0+1 and RAID5 http://www.linuxidc.com/Linux/2012-07/65567.htm
Linux Getting Started Tutorial: Disk array (RAID) http://www.linuxidc.com/Linux/2014-07/104444.htm
[Go] Use RAID under Linux