Using RAID in Linux (1): introduces the level and concept of RAID
RAID refers to the Redundant Array of Inexpensive Disks, but now it is called the Redundant Array of Independent Drives ). A small disk was very expensive in the past, but now we can buy a larger disk at a low cost. Raid is a collection of disks that are put together to form a logical volume.
Understanding RAID settings in Linux
RAID contains a group, a set, or even an array. A group of disks combined with drives form a RAID array or RAID set. Connecting at least two disks to a RAID Controller to form a logical volume can also put multiple drives in a group. A group of disks can only use one RAID level. RAID can improve server performance. The performance varies with RAID levels. It stores our data through fault tolerance and high availability.
This series is named "using RAID in Linux" and is divided into nine parts, including the following topics:
- Part 1: Introduction of RAID levels and concepts
- Part 1: How to Set RAID0 (striping) in Linux)
- Part 1: How to Set RAID1 (image-based) in Linux)
- Part 1: How to Set RAID5 in Linux (striping and distributed parity)
- Part 1: How to Set RAID6 in Linux (dual-distributed parity)
- Part 1: Configuring RAID 10 or 1 + 0 (nesting) in Linux)
- Part 1: add existing RAID arrays and delete damaged Disks
- Part 1: recover (rebuild) damaged drive in RAID
- Part 1: Managing RAID in Linux
This is Part 1 of nine series of tutorials. Here we will introduce the concept of RAID and the RAID level, which should be understood in building RAID in Linux.
Software RAID and hardware RAID
The performance of software RAID is low because it uses host resources. The RAID software needs to be loaded to read data from the software RAID volume. Before loading RAID software, the operating system must boot up to load the RAID software. No physical hardware is required in software RAID. Zero cost investment.
High Performance of hardware RAID. They use PCI Express cards to physically provide dedicated RAID controllers. It does not use host resources. They have NVRAM for cache reading and writing. Cache is used for RAID reconstruction. Even if a power failure occurs, it uses the backup battery power to keep the cache. It is a very expensive investment for large-scale use.
The hardware RAID card is as follows:
Hardware RAID
Important RAID concepts
- The verification method is used in RAID reconstruction to regenerate lost content from the information saved in the verification. RAID 5 and RAID 6 are based on verification.
- Striping is to randomly store sliced data to multiple disks. It does not store complete data in a single disk. If we use two disks, each disk stores half of our data.
- Images are used for RAID 1 and RAID 10. The image automatically backs up data. In RAID 1, it saves the same content to other disks.
- Hot Backup is only a backup drive on our server. It can automatically replace the faulty drive. In our array, if any drive is corrupted, the hot backup drive is automatically used to recreate the RAID.
- The block is the minimum unit of data read and write by the RAID Controller, with a minimum of 4 kb. By defining the block size, we can increase the I/O performance.
RAID has different levels. Here, we only list the most used RAID levels in real environments.
- RAID0 = strip
- RAID1 = image
- RAID5 = distributed parity of a single disk
- RAID6 = dual-disk distributed parity
- RAID10 = image + strip. (Nested RAID)
RAID is managed using the package named mdadm on most Linux distributions. Let's first look at each RAID level.
RAID 0/Strip
Strip has good performance. In RAID 0 (striping), data is written to the disk in slices. Half of the content is stored on one disk, and the other half is written to another disk.
Suppose we have two disk drives. For example, if we write the data "TECMINT" to the logical volume, "T" will be saved in the first disk, and "E" will be saved in the second disk, 'C' will be stored in the first disk, and "M" will be stored in the second disk, which will continue this cycle. (LCTT note: in fact, it is impossible to slice by byte, but by data block .)
In this case, if any one of the drives fails, data will be lost because only half of the data in the drive cannot be used to reconstruct RAID. However, RAID 0 is very good when comparing the write speed and performance. Creating RAID 0 requires at least two disks. If your data is very valuable, do not use this RAID level.
- High performance.
- RAID 0 has zero capacity loss.
- Zero fault tolerance.
- Writing and reading have high performance.
RAID 1/Mirroring
Images also have good performance. Images can make the same copy of our data. Suppose we have two 2 TB hard drive, we have a total of 4 TB, but in the image, the drive placed behind the RAID Controller forms a logical drive, we can only see that the logical drive has 2 TB.
When we save the data, it will be written to both 2 TB drives at the same time. Creating a RAID 1 (mirrored) requires at least two drives. In case of disk failure, we can recover the RAID by replacing a new disk. If any disk in RAID 1 fails, we can obtain the same data from another disk because the same data exists in another disk. Therefore, there is no data loss.
- Good performance.
- The total capacity is half of the available space.
- Completely fault tolerant.
- Rebuilding is faster.
- Write Performance slows down.
- Read performance improves.
- It can be used in operating systems and small-scale databases.
RAID 5/distributed parity
RAID 5 is mostly used at the enterprise level. RAID 5 works in Distributed parity mode. The parity information is used to reconstruct the data. It is rebuilt from the information on the remaining normal drive. This protects our data when the drive fails.
Suppose we have four drives. If a drive fails and then we replace the faulty drive, we can recreate the data from the parity check to the replaced drive. The parity information is stored on all four drives if we have four 1 TB drives. The parity information is stored in the GB of each drive, while the other GB is used by the user. After a single drive failure, RAID 5 still works normally. If the number of drive damages exceeds 1, data loss may occur.
- Superior Performance
- The read speed will be very good.
- The write speed is at an average level. If we do not use a hardware RAID Controller, the write speed is slow.
- Rebuild from the parity information of all drives.
- Completely fault tolerant.
- One disk space will be used for parity.
- It can be used in file servers, Web servers, and very important backups.
RAID 6 Dual distributed parity Disk
RAID 6 is similar to RAID 5, but it has two distributed parity checks. Mostly used in a large number of arrays. We need at least four drives. Even if two drives fail, we can still Replace the drive and recreate the data.
It is slower than RAID 5 because it writes data to four drives at the same time. When we use a hardware RAID Controller, the speed is on average. If we have 6 1 TB drives, 4 drives will be used for data storage, and 2 drives will be used for verification.
- Poor performance.
- Read performance is good.
- If we do not use the hardware RAID Controller, the write performance will be poor.
- Rebuild from two parity drives.
- Completely fault tolerant.
- Two disk spaces will be used for parity.
- It can be used for large arrays.
- Used in backup and video streams for large-scale use.
RAID 10/image + strip
RAID 10 can be called 1 + 0 or 0 + 1. It will be used for image + strip. In RAID 10, first make an image and then make a strip. First, strip on RAID 01 and then make an image. RAID 10 is better than RAID 01.
Suppose we have four drives. When I write data on a logical volume, it saves the data to four drives using images and strip.
If I write the data "TECMINT" on RAID 10, the data will be saved as follows. First, "T" is written to two disks at the same time, and "E" is also written to the other two disks, and all data is written to two disks. In this way, each data can be copied to another disk.
At the same time, it will use RAID 0 to write data, follow to write "T" to the first group disk, and "E" to the second group disk. Write "C" to the first disk, and then "M" to the second disk.
- Good read/write performance.
- The total capacity is half of the available space.
- Fault Tolerance.
- Quickly rebuild from the copy data.
- Because of its high performance and high availability, it is often used in the storage of databases.
Conclusion
In this article, we have learned what RAID is and what level of RAID is used in the actual environment. I hope you have learned what I wrote above. The basic knowledge about RAID is required for building RAID. The above content can basically meet your understanding of RAID.
In the following article, I will introduce how to set and use various levels to create RAID, add RAID groups (arrays) and drive troubleshooting.
How to build a RAID 10 array on Linux
Debian soft RAID Installation notes-use mdadm to install RAID1
Common RAID technology introduction and demo (Multi-chart)
The most common disk array in Linux-RAID 5
RAID0 + 1 and RAID5 Performance Test Results
Getting started with Linux: disk array (RAID)
Via: Introduction to RAID, Concepts of RAID and RAID Levels
Author: Babin Lonston Translator: strugglingyouth Proofreader: wxy
This article was originally translated by LCTT and launched with the Linux honor in China
This article permanently updates the link address: