Using RAID under Linux (1): Introduction to RAID levels and concepts

Source: Internet
Author: User

RAID It means a redundant array of inexpensive disks (redundant array of inexpensive Disks), but now it is known as a redundant array of independent disks (redundant array of independent Drives). Earlier a very small disk was very expensive, but now we can buy a larger disk cheaply. Raid is a collection of disks that are put together to become a logical volume.

AID contains a group or a collection or even an array. Use a set of disk-combined drives to compose a RAID array or RAID set. Connect at least two disks to a single RAID controller and become a logical volume, or you can place multiple drives in a single group. A set of disks can only use one RAID level. Using RAID can improve the performance of the server. Different RAID levels, performance will vary. It preserves our data through fault tolerance and high availability.

This series, named "Using RAID under Linux", is divided into 9 sections, including the following topics:

    • part 1th: Introduction to RAID levels and concepts

    • part 2nd: How to set RAID0 (striped)

    • part 3rd: How to set up RAID1 (mirroring)

    • Part 5th: How to set RAID6 (striped dual distributed parity) in Linux

    • Part 6th: Set up RAID 10 or 1 + 0 (nested) in Linux

    • 7th: Add existing raid Array and delete the damaged disk

    • part 8th: Recovering (rebuilding) a damaged drive in RAID

    • part 9th: Managing RAID in Linux

This is the 1th part of the 9 Series of tutorials, where we'll cover the concept of RAID and RAID level, which you need to understand to build a raid on Linux.

Software RAID and hardware RAID

Software RAID performance is low because it uses the host's resources. The RAID software needs to be loaded to read data from the software RAID volume. Before the RAID software is loaded, the operating system needs to be booted to load the RAID software. No physical hardware is required in software RAID. 0 cost investment.

Hardware RAID performance is high. They use a PCI Express card to physically provide a dedicated RAID controller. It does not use host resources. They have NVRAM for the cached read and write. When the cache is used for RAID rebuilds, it uses the backed up battery power to keep the cache, even if a power failure occurs. For large-scale use is very expensive investment.

Important RAID Concepts

  • Checksum method to regenerate the lost content from the information saved by the checksum in the RAID rebuild. RAID 5,raid 6 is based on checksums.

  • Striped is to randomly store the slice data on multiple disks. It does not save the full data on a single disk. If we use 2 disks, each disk stores half of our data.

  • Mirror is used for RAID 1 and RAID 10. Mirroring automatically backs up data. In RAID 1, it will save the same content to other disks.

  • Hot Backup just a spare drive on our server, it can automatically replace the failed drive. In our array, if any one of the drives is damaged, the hot backup drive is automatically used to rebuild the RAID.

  • Block is the minimum unit, minimum 4KB, for each time the RAID controller reads and writes data. By defining the block size, we can increase I/O performance.

RAID there are different levels. Here, we only list the RAID levels that are most used in real-world environments.

    • RAID0 = Striped

    • RAID1 = Mirror

    • RAID5 = single-disk distributed parity

    • RAID6 = dual-disk distributed parity check

    • RAID10 = Mirror + stripe. (Nested RAID)

RAID On most Linux distributions, a package named Mdadm is used for management. Let's get to know each RAID level first.

RAID 0/ Striped

The stripe has good performance. In RAID 0 (striped), the data is written to disk using slices. Half of the content is placed on one disk, and the other half is written to another disk.

Suppose we have 2 disk drives, for example, if we write the data "Tecmint" to a logical volume, "T" will be saved in the first set, "E" will be saved in the second set, ' C ' will be saved in the first set, "M" will be saved in the second disk, and it will continue this cycle process. (LCTT: It is virtually impossible to slice by byte, which is sliced by data block.) )

In this case, if any one of the drives fails, we lose the data because only half of the data in a disk can be used to rebuild the RAID. However, when comparing write speed and performance, RAID 0 is very good. A minimum of 2 disks are required to create RAID 0 (striped). If your data is very valuable, then do not use this RAID level.

    • Performance.

    • RAID 0 Medium capacity 0 loss.

    • 0 fault tolerance.

    • Writing and reading have high performance.

RAID 1/ Mirroring

Mirroring also has good performance. The image can make an identical copy of our data. Assuming we have two 2TB hard drives, we have 4TB in total, but in the mirror, but the drive behind the RAID controller forms a logical drive, we can only see that this logical drive has 2TB.

When we save the data, it will be written to the two 2TB drives at the same time. Creating RAID 1 (mirroring) requires a minimum of two drives. If a disk failure occurs, we can recover the RAID by replacing a new disk. If any one of the disks fails in RAID 1, we can get the same data from another disk because the same data is available on the other disk. So it is 0 data loss.

    • Good performance.

    • The total capacity loses half of the available space.

    • Complete fault tolerance.

    • Rebuilding will be faster.

    • Slow write performance.

    • Read performance is getting better.

    • can be used for operating systems and small-scale databases.

RAID 5/ distributed parity check

RAID 5 more for enterprise level. RAID 5 works in the form of distributed parity. Parity information is used to reconstruct the data. It rebuilds from the rest of the information on the normal drive. This can protect our data in the event of a drive failure.

Suppose we have 4 drives, and if one drive fails and we replace the failed drive, we can reconstruct the data from the parity to the replacement drive. Parity information is stored on all 4 drives if we have 4 1TB drives. The parity information is stored in 256G per drive, while the other 768GB is used by the user. RAID 5 still works after a single drive failure, which can cause data loss if the number of drives is more than 1 damaged.

    • Excellent performance

    • The reading speed will be very good.

    • Write speed is on average, and if we don't use a hardware RAID controller, the write speed is slow.

    • Rebuilds from the parity information of all drives.

    • Complete fault tolerance.

    • 1 disk space will be used for parity.

    • Can be used in file servers, Web servers, and very important backups.

RAID 6 Dual distributed parity disk

RAID 6 similar to RAID 5, but it has two distributed parity. are mostly used in large numbers of arrays. We need a minimum of 4 drives, and even if 2 drives fail, we can still rebuild the data after the new drive is changed.

It is slower than RAID 5 because it writes data to 4 drives at the same time. When we use the hardware RAID controller, the speed is at an average level. If we have 6 1TB drives, 4 drives will be used for data saving and 2 drives will be used for verification.

    • Poor performance.

    • Read the performance very well.

    • If we do not use hardware RAID controller write performance will be poor.

    • Rebuild from two parity drives.

    • Complete fault tolerance.

    • 2 disk space will be used for parity.

    • Can be used with large arrays.

    • Used for backup and video streaming, for large scale.

RAID Ten/ Mirror + stripe

RAID Ten can be called 1 + 0 or 0 +1. It will do mirroring + stripe two jobs. In RAID 10, first make a mirror and then do a stripe. First make a stripe on RAID 01, then do the mirroring. RAID 10:01 is good.

Let's say we have 4 drives. When I write data on a logical volume, it saves the data to 4 drives using mirroring and stripe.

If I write the data "Tecmint" on RAID 10, the data will be saved using the following method. First write "T" to two disks at the same time, "E" will also write to the other two disks, all data are written to two disks. This allows you to copy each data to a different disk.

At the same time it will write data using RAID 0, followed by writing "T" to the first set of disks, "E" written to the second set of disks. Write "C" again to the first set of disks, "M" to the second set of disks.

    • Good read and write performance.

    • The total capacity loses half of the free space.

    • Fault tolerant.

    • Quickly rebuild from the replica data.

    • Because of its high performance and high availability, it is often used in database storage.

Conclusion

In this article, we've learned what raid is and what level of RAID is used in the real world. I hope you have learned what is written above. For RAID construction, you must know the basics of RAID. The above content can basically meet your understanding of RAID.

In the next article, I'll show you how to set up and use various levels to create RAID, increase RAID groups (arrays), drive troubleshooting, and more.

free pick up brother even it education original Cloud Computing Training video/Detailed Linux tutorials, details of the website customer service: http://www.lampbrother.net/linux/or hooking up q2430675018~

Welcome to the Linux Communication Group 478068715


Using RAID under Linux (1): Introduction to RAID levels and concepts

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.