General thread-software RAID in new Linux2.4 kernel, part 2
Source: Internet
Author: User
Article Title: General thread-software RAID in the new Linux2.4 kernel, Part 2. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
By Daniel Robbins
Install RAID-1 in the production environment
The new 2.4 kernel has finally been released. now we should find an idle PC and install Linux to see what it can do. In these two articles, Daniel Robbins introduced Linux 2.4 software RAID-a technology that enhances disk performance and reliability by distributing data on multiple disks. In this article, Daniel explains what RAID-1, 4, and 5 can do, what cannot be done, and how to implement these RAID schemes in the production environment. In the second part of this article, Daniel will guide you through the simulation process of replacing the RAID-1 faulty drive.
RAID in reality
In my previous article, I introduced the software RAID feature of Linux 2.4 and explained how to create linear volumes, RAID-0 volumes, and RAID-1 volumes. This article describes how to use RAID-1 in a production environment to improve availability. Compared to installing RAID-1 on a test server or at home, this requires you to have a deeper understanding of RAID and learn more-especially, you need to know exactly what protection RAID-1 can provide and how to keep the RAID volume in startup and running state in case of disk failure. This article will discuss what RAID-1, 4, and 5 can do and cannot do, finally, simulate a complete test process for replacing the RAID-1 faulty drive-you should try your best to experience this process (in this article ). After completing this simulation, you will have all the experience required to handle RAID-1 faults in the real environment.
RAID cannot do anything
RAID Fault tolerance is designed to avoid the negative impact of occasional drive failures. This design is very good. However, RAID is not always an ideal solution for various reliability problems. In a production environment, it is important to know exactly what RAID can do and what it cannot do before implementing a fault-tolerant RAID (1, 4, 5. When we are in a situation dependent on RAID, we do not want to have a wrong understanding of its role. First, we need to clarify some common mistakes in RAID 1, 4, and 5.
Many people believe that if all important data is stored on a RAID 1/4/5 volume, there is no need to regularly back up the data. This is completely wrong-the reasons are as follows. RAID 1/4/5 helps avoid unexpected shutdown caused by accidental drive failure. However, it does not prevent accidental or malicious data corruption. If you type "cd/; rm-rf *" as the root user on a RAID volume, you will lose a large amount of important data. in this case, even if you have a RAID-5 configuration containing 10 drives, it will not help. Similarly, if your server is physically stolen or a building is in Fire, RAID will not help. Without a doubt, if you do not implement a backup policy, you will not have Archive files with historical data-if a colleague deletes a batch of important files, you will not be able to recover them. This alone should be enough to convince you that, in most cases, a backup policy should be planned and implemented even before RAID-1, 4, and 5 are considered.
Implementing software RAID on a system consisting of poor hardware is another misunderstanding. If you are assembling a server for important tasks, it is reasonable to purchase the best quality hardware within the scope of your budget. If your system is unstable or has poor heat dissipation, you will be in a situation where RAID is powerless. Similarly, if a power failure occurs, RAID obviously cannot provide a longer normal running time. If the server is scheduled to undertake any important tasks, make sure it is equipped with an uninterruptible power supply (UPS ).
Next, this article will focus on the file system. The file system exists on the software RAID volume. This means that using software RAID does not avoid file system problems. for example, if you happen to be using a non-log file system or a file system that regularly fragment, there may be time-consuming and easy-to-run file system checks. Therefore, software RAID does not improve the reliability of the ext2 file system, which is why ReiserFS, JFS, and XFS are still emphasized in the Linux camp. Software RAID and reliable log file systems are an ideal combination.
RAID-intelligent implementation
I hope that the previous section has clarified your understanding of any RAID errors. When implementing RAID-1, 4, and 5, it is very important to regard it as a technology that extends the normal running time. Once you implement one of the RAID types, you can avoid a very special situation-unexpected full (one or more) drive failure. In this case, software RAID allows the system to continue running and you can schedule a new drive to replace the faulty drive. In other words, if you implement RAID 1, 4, or 5, it reduces the risk of prolonged unexpected shutdown due to a full drive failure. Instead, you only need a short scheduled downtime-just set aside time to replace the bad drive. Obviously, this means that if you have a high availability system that is not your preferred choice, you should not implement software RAID unless you plan to primarily use it as a way to improve file I/O performance.
Savvy system administrators use RAID for a specific purpose-to improve the reliability of highly reliable servers. If you are a savvy system administrator, you already know the basic content. You have implemented a regular backup plan to protect your organization from disasters. You have connected the server to the UPS and the UPS monitoring software is running. in this way, your server will be safely shut down when a power outage occurs for a long time. You may be using a log file system, such as ReiserFS, to shorten the file system check time and enhance the reliability and performance of the file system. We hope that your server will enjoy good heat dissipation and be composed of high-quality hardware, and you have paid close attention to security issues. At this time, the reader should only consider implementing software RAID-1, 4 or 5-so that you can prevent full drive failures on the server, thus, the normal running time of the server may be extended by several percentage points. Software RAID is an additional layer of protection that makes stable servers more robust.
RAID-1 pre-arranged
Now that you know what RAID can do and what it cannot do, I hope you have reasonable expectations and a correct attitude towards it. In this section, I will show you how to simulate a disk failure and then exit the downgrade mode for your RAID volume. It is recommended that you install a RAID-1 volume on a test machine and simulate it with me. I strongly recommend that you do this. This simulation can be very interesting. Take a moment to relax, so that you can be calm and know how to deal with the failure of the drive.
Well, first install a RAID volume. if you need to review how to implement this, see my previous article. To perform this test, you must install your own RAID-1 volume to disconnect a hard drive because it will be the way we simulate drive failure, you can still boot the Linux system.
After installing your own volume, if you execute the cat/proc/mdstat command, the output you see will be similar to this sample code.
Note that devfs is used, which is why the reader sees the very long device name listed above. I actually use/dev/hda5 and/dev/hde1 as RAID-1 disks. At this point, the kernel software RAID code is synchronizing the two drives so that they can become the image of each other accurately. If the RAID-1 volume is normal, continue to the next step. create a file system in the volume and mount it to a location. Copy some files to the volume and set/etc/fstab to automatically mount the volume (/dev/md0) during system boot ). Below is the line I added in my fstab; the line you want to add may be slightly different:
/Dev/md0/mnt/raid1 reiserfs defaults 0 0
Now, we have almost prepared to simulate drive failure, but it is not everything. First, run cat/proc/mdstat again, and wait until all disks in the volume are synchronized. After the synchronization is complete,/proc/mdstat will be similar to this sample code.
Start simulation
Now that the synchronization has been completed, we have prepared for the simulation. Continue down, shut down the machine, and cut off the power supply. Then, open the chassis and disconnect a hard disk consisting of a RAID-1 array. Of course, you certainly do not want to disconnect the hard disk that contains the Linux root partition-we will need it to start Linux again! OK. the hard disk is disconnected. please restart the machine. After you log on, you should find that/dev/md0 is mounted and you can still use this volume. When cat/proc/mdstat is executed, you will see the main changes:
You can see that the/dev/md0 volume is running in downgrade mode. I disconnected the Drive/dev/hde, so the system will not find/dev/hde1 when the kernel boots and tries to automatically start the array. Fortunately,/dev/hda5 is found in the kernel and/dev/md0 can be started in downgrade mode. As you can see, the/dev/hde1 partition is not listed in/proc/mdstat, in addition, a RAID disk is marked as "unavailable" ("[U _]" rather than "[UU]"). However, since/dev/md0 is still running, software RAID-1 is executing the expected task-keep data available.
Restore
We are experiencing a simulated drive failure. If a drive that is not powered on fails during system operation, this is exactly where we are. Our RAID-1 volume runs in degraded mode, that is, the volume is still available, but there is no redundancy. At a convenient time, we will want to shut down the system, replace the faulty drive, and restart the system. At this time, our RAID-1 volume will still run in degraded mode.
Once a new drive is installed on the machine, we want to create an appropriate size RAID automatic detection ("FD") partition on it. To enable Linux to re-read the partition table of the disk, you may need to restart the system again. Once the system sees this new partition, we can begin to restore the degraded RAID-1 array-since then we have some redundancy.
Of course, we are only performing a simulation process. To add a partition to the RAID array, you can perform two operations, depending on the scenario you want to simulate. You can disable the machine, connect to the drive, start the machine, and add the original partition to the array, or shut down the machine, connect to the drive, start the machine, and delete the drive, then create a new RAID automatic detection partition ("FD")-of course, the partition size should be appropriate, that is, at least not less than the partition it replaces-then add the new partition to the array. Although the first scheme also simulates some events, such as disk control
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.