Simply talk about raid

Source: Internet
Author: User

RAID is an abbreviation for "Redundant array of independent disk", which translates to redundant arrays called stand-alone disks, which is actually the storage, access, and backup technology of the disk. Before you talk about RAID, learn the basics of memory in a nutshell.

1. Memory base

When it comes to memory, as the name implies, is used to store data, there are many kinds of memory on the market, but also can be broadly divided into two categories: volatile memory and nonvolatile memory, the former loss of power data will be lost, the latter is still in the data, This is determined by the media of the memory, in general, the access speed of volatile memory is significantly higher than non-volatile memory, of course, the price is high. Let's look at the hierarchy of memory in a computer system, which is a memory hierarchy that is given in any book on a computer system, the higher the upper layer (the top of the pyramid), the faster the memory is, and the higher the cost per byte, the slower the lower the bottom (pyramid level) of memory, the cheaper the cost per byte.

  

The fastest is the register in the CPU, because it's expensive, so it's only dozens of bytes.

Located in the L1,L2,L3 layer is the cache, the speed of the cache is also very fast, it is implemented by SRAM (static random memory), the CPU access cache only need a few NS, because the price is relatively expensive, all only a few m~ dozens of m of storage space.

is located in the L4 layer is main memory, running speed, although not cache fast, but the price is also much cheaper, it is implemented by DRAM (dynamic memory), all modern computers generally have a few g of main memory size, the CPU accesses main memory speed generally between dozens of ns~ hundred ns.

There are two main types of memory: volatile and nonvolatile memory. In the computer system, volatile memory mainly includes cache, main memory, volatile is generally implemented by random memory (RAM). RAM is also divided into static RAM (SRAM) and dynamic RAM (DRAM), SRAM is faster than DRAM, but the price is relatively expensive. The cache uses SRAM, while main memory uses DRAM. Non-volatile memory mainly includes disk, SSD, CD, tape, floppy disk and so on.

On the L5 layer is the local disk, the local disk is generally used mechanical storage, that is, access to the disk will produce mechanical loss, between the disk and memory transfer of a byte about 10ms. Database data is generally stored on the disk, so in order to reduce the database of the deletion and modification of the time required to access the disk, we need to establish a series of algorithms and data structures to maintain the database.

Disks are made up of platters, each of which has two faces, each of which is divided into concentric circles, each concentric circle is called a track, and each track is segmented into circular fragments, called sectors. A sector is the smallest indivisible unit of a disk. Each disc has a head used to read and write data on the disk surface, the main memory needs to access the data on the disks, the disk drive to control, the data is stored in a disk surface of a certain sector of a track, so we have to move the head to the corresponding track, this is called seek, seek the time consumed by the path is called Seek time When the head is moved to the corresponding track, we also have to rotate to the corresponding sector, the delay caused by the rotation to the corresponding sector, called the rotation delay, the data is in the sector and the sector of the fan interval of the time required by the head is called the transmission time.

So the time it takes to access a disk is: Seek time + rotation time + transfer time. This is mainly determined by the disk rotation speed, number of sectors, etc., the general 7200 turn/min, sector voids accounted for 10% of the average disk access time is about 10ms.

Because of the non-volatile disk, low cost advantages, so now almost all of the data storage and backup use of disk.

2. Increase reliability through redundancy

Any device can fail, and of course the memory is no exception. How to solve the reliability?? The answer is redundancy . The simplest way to implement redundancy is to copy each disk, which is called mirroring . A single logical disk consists of two physical disks, each of which is executed on two disks. If one of the disks fails, we can read the data from another disk. Only if the first disk fails, and the second disk fails before it is repaired, the data is truly lost. We evaluate the performance of the mirroring technology with average failure time (the failure index is lost), and the average failure time depends on the average failure time of each disk and the average repair time ( the time it takes to replace the failed disk and restore the data on the disk ) .

Assuming that two disk failures are independent of each other, a single disk with an average failure time of 100 000 hours and an average repair time of 10 hours, the average failure time of the mirrored disk is 100 0002/(2*10) =500*106 hours, or 57,000 years. Of course, this is only in the ideal situation, but also to consider other factors.

3. Improve performance by parallelism

The disk cost is low, but the access speed is limited, the average time per visit is about 10ms, which greatly affects the performance of the system, how to improve the speed of disk access? With parallel access, the speed of processing the read data doubles, as the read request can be sent to any disk. The transfer rate for each read operation is the same as the transfer rate in a single disk system, just doubling the number of read operations in a unit of time. It is common to increase the transfer rate by splitting data across multiple disks.

There are also many forms of data splitting: each byte is separated by bits and stored on multiple disks, which is called a super split . Splitting a block onto multiple disks is called block-level splitting .

Block-level splitting is the most common form of splitting. When we want to store 8 logical blocks on disk, we can store them on a disk numbered 0,1,2,3 four, and the block I data is stored on the (I mod 4) disk so that each time we read this data, we can read it in parallel from 4 disks.

Regardless of the form, the parallel disk system improves performance by a few principles:

(1) Load balancing multiple small access operations to increase throughput of access operations

(2) Perform large access operations in parallel to reduce response time for large access operations

4. RAID

Although the image is highly reliable, each logical disk requires two physical disks, which is too expensive and the data splitting increases the rate, but there is no guarantee of reliability. A series of alternatives are proposed, which have different tradeoffs between cost and performance, and divide the scenarios into several RAID levels . These scenarios are actually based on the idea of parity and data splitting .

  • RAID Level 0 : Just a simple block-level split without any redundant disk array. This level of RAID simply increases the speed at which data is read from the disk, but there is no guarantee of reliability, and when a disk fails, there is no way to recover it.
  • RAID Level 1 : Disk mirroring using block-level splitting (many enterprises use RAID 1+0 to refer to split mirrors, and RAID 1 to refer to mirrors that do not use split). Because of the data splitting, it is possible to increase the read rate by parallelism, because there is mirroring, all also have high reliability, and the data reconstruction is simple and provides high write performance, but the cost is very high.
  • RAID Level 2: use parity bit, also known as memory-style error correcting code (ECC) organization structure.
  • RAID Level 3:
  • Bit-crossing parity structure, is an improvement to RAID level 2. The disk controller detects that a sector can be read correctly, so it is possible to check for errors and errors with a single parity bit. If a sector is destroyed, the system is able to know exactly which sector is being destroyed, and for each bit of the sector, the system is judged by the parity of the corresponding bits on the corresponding sectors on the other disks.
      • For example, we assume that there are three disks, with only one sector per disk, and only eight bits per sector.
      • Plate 1:11.11 million
      • Plate 2:10,101,010
      • Plate 3:00111000
      • We also need a redundant disk as the parity bit:
      • Plate 4:01100010
      • Of the above eight bits, each of the 4 disks in each of the 1 numbers and is an even number. If the data on disk 2 is changed from 10101010 to 11001100, we can recover disk 2 data from the remaining several disks. The data in any one disk is the modulo 2 and the corresponding data in the other disk.
  • RAID Level 4: block-crossed parity organization, using block-level splitting, to keep a parity block on a separate disk for the corresponding block on the other n disks. Multiple reads can be performed in parallel, resulting in a high total I/O transfer rate and a high transfer rate for writing large amounts of data (data and parity bits can be written in parallel). However, the amount of data cannot be executed in parallel, and writing a block requires simultaneous access to both the storage disk and the parity disk, because the parity disk needs to be updated.
  • RAID Level 5: The organizational structure of the distributed parity bits of the block crossover is an improvement to the raid 4 level. RAID Level 5 distributes the data and parity bits across all n+1 disks, all of which can participate in the read operation. For each n Logical disk block, it is necessary to n+1 a physical disk block, the corresponding block in one disk to store the parity bit, and the corresponding block in the remaining N disks to store the data.
  • RAID Level 6: Redundancy scheme for P+Q. Similar to RAID level 5, only additional redundant information is stored to handle the simultaneous failure of multiple disks.

5. How to select RAID level

Specific problems specific analysis, generally in the application of the following factors are mainly considered:

(1) The cost of additional storage required

(2) Performance requirements for the number of I/O operations

(3) Performance of disk failure events

(4) Performance in the data reconstruction process (failure recovery)

RAID 0 levels can only be used in applications where data security requirements are not high;

RAID Level 3 is a RAID Level 2 improvement, RAID 5 level is a RAID 4 level improvement, we only need to consider RAID 3 level and RAID 5 level. RAID Level 3 is more than a premium split, RAID is block-level split, block-level splitting for large amounts of data has the same good transfer rate as the raid 3 level, and fewer disks are used for the transfer of small amounts of data. RAID Level 6 is more reliable than RAID 5 and can be used in applications where data security is critical. RAID Level 1 provides the best write performance, RAID level 5 has a lower storage load than RAID 1, but write operations require higher time overhead, so RAID 5 is applied with read operations and fewer write applications.

Simply talk about raid

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.