Article Title: Fault Tolerance Analysis of RAID-5 Disk Arrays. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
In the previous article "linux.chinaitlab.com/administer/778593.html" target = _ blank>Playing with disk arrays in LinuxThe author mentioned that RAID-5 disk arrays are currently the most common disk arrays. In the RAID-5 disk array mode, data is evenly distributed to each hard disk. Therefore, even if a hard disk is damaged, you can use the content on other hard disks to restore the damaged data in the hard disk. However, when two or more hard disks fail at the same time, the data in the hard disk cannot be repaired. To enable the disk array to play its role, the Linux system administrator needs to analyze the fault tolerance capability of the disk array. Generally, a test is performed once every quarter or half a year based on the company's data loss. To ensure that the disk array works properly.
1. test whether the disk array is working.
Sometimes it is necessary to test whether the disk array is working for years, and perform a "destructive" test on the disk array. Here, the destructive test is enclosed in quotes, not to damage the hard disk. To test whether the RAID-5 disk array is damaged by the hard disk, you must first disable the server. Unplug one of the hard disks so that the system cannot find them. In this case, the system will think this hard disk is damaged. Will try to use data from other hard disks to restore the contents of the damaged hard disk. After Linux is restarted, the system can still start the RAID-5 disk array because only one hard disk is removed. When a disk array is used, the data is automatically restored. That is, after a hard disk is missing, everything can still run normally. You can view the related information by viewing the status record file of the disk array.
In Linux, the disk array status file is saved in/proc/mdstat. In this status file, we can see that the number of hard disks enabled in the current disk array indicates the serial number of the current hard disk. Although it is said that data cannot be restored if two or more hard disks are damaged at the same time, there are still a few such cases. The probability of two hard disks being damaged at the same time is not very high. Therefore, the disk array has a high value in terms of security.
When deploying disk arrays in Linux, there is also a benefit. Even if two hard disks are damaged and data cannot be repaired, the Linux operating system can still be started. When two or more hard disks are damaged, the system will issue a warning after the disk is restarted. The administrator can use the ROOT account to log on. Then, change the name of the/ETC/RAIDTAB file and restart the file to log on to the Linux system. Unfortunately, the data on the hard disk cannot be recovered.
2. enhance the security of the disk array by using a hard disk.
Data cannot be recovered due to damage to more than two hard disks. Therefore, the Linux System Administrator will consider whether a backup hard disk can be attached to the Linux system. In normal times, this hard disk is not included in the disk array. When a hard disk in the disk array is damaged, use this backup hard disk as a replacement. In this case, even if the Administrator does not find any damaged hard disk, the impact on the Enterprise will not be great.
Most disk array technologies support backup hard disks. When a hard disk is not working properly, the disk array immediately starts the backup hard disk. Some fault tolerance technologies are used to restore data to the normal state. However, not all disk array technologies support this backup hard disk. In the previous article, I introduced several common disk array methods. The Linear mode is mentioned. This mode does not distribute data, but stores data on another hard disk when the storage space is full. Therefore, there is no fault tolerance mechanism. In this case, the backup hard disk does not work. However, in RAID-5 mode, data is stored discretely, which has high fault tolerance. Therefore, if an additional backup hard disk is configured, the security of the disk array can be further improved. In addition, when the operating system is started, the backup hard disk is also started. The data is not stored on this hard disk. Data is stored in the backup hard disk only when a hard disk is damaged.
If the system administrator needs to add a backup hard disk to an existing disk array, modify the disk array configuration file/etc/raidtab. In fact, this change is also very simple. Generally, you only need to add two statements to the configuration file.
The first statement is nr-spare-disk 1. This statement is used to indicate that the hard disk is a backup hard disk. Unless other hard disks are damaged, the disk array will not store data in the hard disk.
The second statement is device/dev/had. This statement is used to specify the partition name of the backup hard disk. We know that the disk array technology in Linux is very different from that in Microsoft operating systems. In Microsoft operating system, it is measured in hard disks. However, in Linux, partitions are used. Therefore, in Linux, the disk array technology can be implemented even if there is only one hard disk, but the related functions cannot be implemented. Therefore, in addition to the hard disk, you must specify the name of the hard disk partition.
There is another note. Some Linux administrators may worry that multiple hard disks may be damaged at the same time. Therefore, they will configure multiple backup hard disks for the disk array. In this case, if the hard disk is damaged, the system will consider which hard disk should be used first. In this case, you can use the spare-disk statement to specify which hard disk is used first. If the serial number is set to 0, it indicates that this backup hard disk is the first sequential backup hard disk used by the disk array. Other configurations are the same as the general configurations of the disk array, so we will not repeat them here.
However, note that the backup hard disk does not take effect immediately after a complete hard disk is added. After modifying the configuration file of the disk array, use the Raidstop command to stop the disk array, and then use the Mkriad command to reinitialize the disk array. During the initialization process, the system synchronizes all hard disk data in the background. So this process will change with the amount of hard disk data. When there is a large amount of data in the hard disk, this process may take a long time. During this process, do not restart the Linux system. After a backup hard disk is set, if a hard disk in the disk array is damaged, the system automatically starts the backup hard disk. The data on the backup hard disk is rebuilt based on the data on other hard disks. The reconstructed data is equivalent to a copy of the damaged hard disk data. In this way, the fault tolerance of the disk array can be improved.
[1] [2] Next page