Detailed explanation of Hard Disk RAID technology
I. Raid Definition
RAID (Redundant Array ofindependent disk independent redundant disk array) technology was proposed by the University of California at Berkeley in 1987, initially to combine small cheap disks to replace large expensive disks, at the same time, it is hoped that the disk will not cause data access losses when it becomes invalid, and a certain level of data protection technology will be developed. Raid is a Redundant Array composed of multiple low-cost disks. It appears as an independent large storage device in the operating system. Raid can give full play to the advantages of multiple hard disks, increase the speed and capacity of hard disks, and provide Fault Tolerance functions to ensure data security and ease of management, you can continue to work in the case of any hard disk failure, without being affected by the damage to the hard disk.
Ii. Several raid Modes
1. raid0
That is, data stripping data sharding technology. RAID 0 can connect multiple hard disks into a larger hard disk group to improve disk performance and throughput. Raid0 has no redundancy or error repair capability, and has a low cost. It requires at least two disks. It is generally used only when there is a low requirement on data security.
(1) RAID 0 is the easiest way
It is to concatenate x hard disks in the form of hardware through smart disk controllers or disk drivers in the operating system in the form of software to form an independent logical drive, the capacity is X times that of a single hard disk. Data is written to each disk in turn during computer data writing. When the space of a disk is exhausted, data is automatically written to the next disk. The advantage is that the disk capacity can be increased. The speed is the same as that of any disk. If any disk fails, the entire system will be damaged. The reliability is 1/n of a single hard disk.
(2) Another RAID 0 Method
It is used to create a zone set with a reasonable size of N hard disks. It is best to assign a dedicated disk controller to each hard disk, data is read and written to N disks at the same time during computer data reading and writing, improving the speed by N times. Improve system performance.
2. Raid 1
Raid 1 is called a disk image: It mirrors data from one disk to another, ensuring the reliability and maintainability of the system to the maximum extent without affecting the performance, it has a high data redundancy capability, but the disk utilization is 50%, so the cost is the highest. It is used to store critical data. Raid 1 has the following features:
(1) Each disk in Raid 1 has a corresponding image disk, and data is synchronized to the image at any time. The system can read data from any disk in a group of image disks.
(2) the space used by the disk is only half of the total disk capacity, and the system cost is high.
(3) As long as there is at least one disk on any one of the image disks in the system, the system can run normally even when half of the hard disks are faulty.
(4) If the RAID system is no longer reliable due to a hard disk failure, the damaged hard disk should be replaced in a timely manner. Otherwise, the entire system will crash if there are problems with the remaining image disks.
(5) After the new disk is changed, the original data will require a long time synchronization Image, and external access to the data will not be affected, but the performance of the entire system will decrease.
(6) RAID 1 has a large disk controller load. Using Multiple disk controllers can improve data security and availability.
3. raid01
Combined with raid0 and raid1 technologies, each disk has its physical image disk, which provides full redundancy and allows one or more disks to fail, it does not affect data availability and has the ability to read/write quickly. For RAID 0 + 1, you must create at least four hard disks with a zone set in the disk image.
4. raid2
When writing data, the computer saves the data bit on one disk, and saves the Hamming checkcode obtained by the bit operation of the data to another disk, the Hamming code can correct errors in case of data errors to ensure correct output. However, the Hamming Code uses data redundancy technology, so that the output data rate depends on the slowest disk in the drive group. The raid2 controller is easy to design.
5. raid3
Parallel Transmission with parity Codes
RAID 3 uses a dedicated disk to store all the verification data, and creates read/write operations with Scattered Data in the partition set in the remaining disk. When reading data from an intact RAID 3 system, you only need to find the corresponding data block in the data storage disk for read operations. However, when writing data to raid3, you must calculate the check value of all data blocks in the same zone as the data block and re-write the new value to the check block, this increases the system overhead. When a disk fails, all data blocks on the disk must be re-created using the verification information. If the data block to be read is located on the damaged disk, the system must read all other data blocks in the same zone at the same time and reconstruct the lost data based on the check value, which slows down the system. After a damaged disk is replaced, the system must create a data block and a data block to reconstruct the data in the bad disk. The performance of the entire system will be seriously affected. The biggest disadvantage of RAID 3 is that the verification disk can easily become the bottleneck of the entire system. Applications that often perform a large number of write operations will cause the performance of the entire raid system to decline. RAID 3 is suitable for databases and web servers.
6. raid4
Raid4 is an independent disk structure with a parity code. raid4 and raid3 are very similar. It accesses data by data block, that is, by disk. Each time it is a disk, raid4 features similar to raid3, but in the case of failure recovery, it is much more difficult than raid3, and the controller design is much more difficult, in addition, the data access efficiency is not very good.
7. RAID5
RAID 5 disperses the verification block to all data disks. RAID 5 uses a special algorithm to calculate the storage location of any band verification block. This ensures that any read/write operations on the verification block are balanced across all RAID disks, eliminating the possibility of bottlenecks. The reading efficiency of RAID5 is very high, the writing efficiency is average, and the block-based collective access efficiency is good. RAID 5 improves the system reliability, but it does not solve the data transmission concurrency well, and the controller design is also quite difficult.
8. raid6
Raid6 is an independent disk structure with two parity codes for distributed storage. It is an extension of RAID5 and is mainly used when data is absolutely error-free. It uses two parity values, therefore, N + two disks are required. At the same time, the design of the controller becomes very complex and the write speed is not good. It takes a lot of time to calculate the parity value and verify the data correctness, this causes unnecessary loads and is rarely used.
9. raid7
Raid7 is an optimized high-speed data transmission disk structure. All its I/O transmissions are synchronized and can be controlled separately, which improves the system's concurrency and the speed at which the system accesses data; each disk has a high-speed buffer memory. The real-time operating system can use any real-time operating chip to meet the needs of different real-time systems. The SNMP protocol can be used for management and monitoring, and an independent transfer channel can be specified for the verification area to improve efficiency. Multiple hosts can be connected. When multiple users access the system, the access time is close to 0. However, if the system is powered off, all the data in the high-speed buffer memory will be lost. Therefore, it is necessary to work with the ups, And the raid7 system costs a lot.
10. raid10
Raid10 is a high-reliability and efficient disk structure. It is a band structure and a mirror structure, which can achieve both high efficiency and high speed. This new structure is expensive and not scalable.
11. raid53
Raid7 refers to the efficient data transmission disk structure, which is the unified structure of raid3 and band. Therefore, it is fast and fault-tolerant. However, the price is very high and it is not easy to implement.
Why disk arrays?
How to increase the access speed of disks, how to prevent data loss due to disk failures, and how to effectively use disk space has always been a problem for computer professionals and users; large-capacity disks are very expensive and impose a huge burden on users. The emergence of the disk array technology solves these problems.
Over the past decade, the processing speed of the CPU has increased by more than 50 times, the access speed of memory (memory) has also increased significantly, and the data storage device-primarily a disk (harddisk) -- The access speed is only increased by three or four times, forming a bottleneck in the computer system, lowering the overall performance (throughput) of the computer system. If the access speed of the disk cannot be effectively improved, the imbalance between CPU, memory, and disk will waste Improving the CPU and memory.
Currently, there are two ways to improve the disk access speed. First, the disk cache controller stores the data read from the disk in the cache memory (cachememory) to reduce the number of disk accesses, data Reading and writing are carried out in the cache memory, greatly increasing the access speed. For example, if the data to be read is not in the cache memory or you want to write data to the disk, to access the disk. In a single-job environment (single-tasking environment) such as DOS, this method has good performance in accessing a large amount of data (small volume and frequent access is not acceptable ), however, in the multi-task environment (because of the constant data exchange (swapping) action) or database access (because each record is small) it cannot display its performance. This method has no security protection. The second is to use the disk array technology. A disk array is an array of multiple disks used as a single disk. It stores data in different disks in a segmented manner, related disks in the array work together to greatly reduce data access time and improve space utilization. Different technologies used by disk arrays are called RAID levels. Different Levels are used for different systems and applications to solve data security problems.
Generally, high-performance disk arrays are achieved in the form of hardware, further combining disk cache control and disk arrays on a controller (RAID Controller) or controller card, different users must meet the four requirements for disk output to the system:
(1) Increase the access speed,
(2) fault tolerance (fault tolerance), security
(3) make effective use of disk space;
(4) try to balance the performance differences between CPU, memory, and disk to improve the overall performance of the computer.
Disk Arrays can be implemented in two ways:
Software and Hardware arrays.
1. A software array refers to configuring multiple hard disks on a common scsicard to a Logical Disk through the disk management function provided by the network operating system to form an array. The software array can provide data redundancy, but the performance of the disk subsystem may be reduced. Currently, Windows
Both nt and Net Ware operating systems can provide software arrays. WindowsNT can provide RAID 0, RAID 1, and RAID 5. The Net Ware operating system can implement the Raid 1 function.
2. The hardware array is implemented using a dedicated disk array card. Currently, almost all non-entry-level servers provide disk arrays, which can be easily implemented whether integrated on the motherboard or non-integrated. The hardware array provides functions such as online resizing, dynamic modification of the array level, automatic data recovery, drive roaming, and ultra-high speed buffering. It provides performance, data protection, reliability, availability, and manageability solutions. The disk array card has a dedicated processor, generally an Intel i960 chip, and a dedicated storage device for high-speed data buffering. In this way, the server directly processes disk operations through the disk array card. Therefore, a large amount of CPU and system memory resources are not required, and the performance of the disk subsystem will not be reduced. The special processing unit of the array card is used for operations. Its performance is much higher than that of conventional non-array hard disks, and it is safer and more stable.
Whether it is an on-board ide raid control chip or an independent PCI interface ide RAID Controller, they all have an independent BIOS for configuration and work, their BIOS settings will be displayed after the system post is complete. We can see that the BIOS screen of the Highpoint hpt372 ide raid control chip has been displayed on the screen, press Ctrl + H to enter the control interface.
Here we can see the following options: createarray, delete array, create/deletespare, and select boot disk) these options. Under the option, the hard disks identified and their working status are displayed. To create an array, select create array. In the displayed image, we can set up all arrays. In arraymode, we can select the raid type, the default mode is RAID 0. You can select RAID 0, 0 + 1, and jbod as needed.
After determining the selected raid mode, you need to name the array. You can select your own easy to remember name. After confirming the name, we will select the drive. We only need two hard disks because we create the RAID 0 mode. We selected two hard disks in the device column below.
The block size can be set after completion. The block size is the most basic data unit in the array. In general, the smaller the block, the more space it can make full use. However, because ideraid does not have its own independent I/O processor, the smaller the block, the higher the resource consumption. In general, the default 64 K or 32 K is a good choice. After all this is done, the entire array is created. After you select start creation process, you can save and exit.
Enable RAID 0
To start from raid, we also need to set the boot sequence to ATA raid in bios, and set the independent RAID card to SCSI Boot. For Windows 9x users, the entire raid has been created. In dos, We Can format and install the system according to a single hard disk partition, the operating system only treats it as a drive. If you are using Windows NT/2000/XP, you need to install your raid driver.
Notes
1. If you want to create RAID 0, you must have at least two hard disks. In addition, once the hard disk is made up of RAID arrays, the original data will be lost.
2. When selecting a hard disk, it is best to do the same. Otherwise, the performance will be reduced. At this time, the theoretical speed is only twice the speed of the slow hard disk.
3. If a hard disk in the RAID 0 array is damaged, all data in the hard disk array will be lost, resulting in loss.
Data is stored on multiple disk drives, and disk reading can be evenly distributed across different disk drives to improve efficiency. Because multiple disk drives are used, the average failure time is prolonged, and repeated backup can also increase data fault tolerance.
RAID 0 disks are stored in parallel. This indicates that the data to be written to the array is first divided into blocks, and then the data blocks are written to different member disks in the array, this method provides high I/O access efficiency at a low cost, but it does not have any fault tolerance capabilities. The storage capacity of the RAID 0 array is equal to the total capacity of all member disks in the hardware raid setting, or the total capacity of the member partition in the software raid setting.
RAID 0 requires at least two disks.
Raid 1 (image storage) writes the same data to each member disk in the array to provide fault tolerance.
Raid 1 requires at least two disks.
RAID 3 disks support parallel storage and same-bit detection data. The data to be written to the array is divided into blocks first, and then the data blocks are written to different member disks in the array. Each data block generates a same-bit detection data, this same-bit detection data will be written to the exclusive disk during data writing, and will be used to determine the correctness of the data during data reading.
RAID 3 requires at least three disks.
RAID 5 disks concurrently store the same-bit distributed detection data. The data to be written to the array is divided into blocks first, and then the data blocks are written to different member disks in the array. Each data block generates a same-bit detection data, when writing data, the same-bit detection data is written to a disk in the disk array. Information about the same-bit detection data is distributed to some or all member disk drives in the array.
RAID 5 requires at least three disks.
Jbod allows you to easily group disk drives to create a large virtual disk. In this mode, the space block is allocated from a member disk in sequence. When the first disk is fully filled, it is distributed to the second disk, and so on. This kind of grouping does not provide any performance gain, because there cannot be any separate I/O operations between member disk drives. The jbod mode does not provide redundancy, and, to be honest, it also reduces reliability-if an error occurs to any member disk, the entire array cannot be accessed and used. The total disk capacity is the capacity of all member disk drives.