For a long time, we have seen a lot of competition over the performance of RAID 5 and raid 10. Even many people have come up with test data, but who is the right. Here, I will analyze the internal operating principles of these two raid types. Under what circumstances should we choose which raid mode.
In order to facilitate the comparison, I will compare the disks with the same number of drives here. RAID5 selects the 3D + 1p raid scheme, and raid10 selects the 2D + 2d raid scheme, respectively.
We will analyze the following three processes: read, continuous write, and random write. However, before introducing these three processes, I need to introduce a particularly important concept:Cache.
In recent years, the cache technology has developed very rapidly in terms of disk storage technology. As a high-end storage, cache is already the core of the entire storage, that is, the low-end storage, there is also a large cache, including the simplest RAID card, which generally contains dozens or even hundreds of megabytes of RAID cache.
What is the main role of cache? It is reflected in two different aspects: Reading and Writing. If writing is performed, the storage array only requires writing to the cache to complete write operations. Therefore, writing to the array is very fast, when the data written to the cache is accumulated to a certain extent, the array only refreshes the data to the disk, which enables batch writing. As for the protection of the cache data, generally, it depends on mirror phase and battery (or ups ).
The read of the cache cannot be ignored, because if the read can hit the cache, the disk seek will be reduced, because the disk generally takes more than 6 ms from the time when it starts to find data, at this time, it may not be ideal for intensive Io applications. However, if the cache can hit, the general response time can be within 1 ms.
Do not trust the storage vendor's iops (I/O per second) data. They may all achieve this based on Cache hits, but in fact, your cache hit rate may only be 10%.
After the introduction of the cache, we can explain the efficiency problem between RAID5 and raid10 in different modes. Then we will analyze the above three problems respectively.
1. Read operations
Because the RAID 5 and raid 10 disks can provide services, there is basically no difference in reading the above, unless the read data can affect the cache hit rate, resulting in a different hit rate.
2. Continuous writing
The continuous write process generally indicates writing a large number of continuous data, such as a media data stream or a large file. If a write cache exists during this write operation, in addition, RAID5 is better than raid10 if there is no problem with the algorithm (here we should assume that the storage has a certain size of write cache, and there is no bottleneck in the computing and validation CPU ). This is because the verification is completed in the cache. For example, for RAID 5 of four disks, you can calculate the verification in the memory and write 3 data + 1 verification at the same time. Raid10 can only write two data records and two mirror phases at the same time.
For example, RAID5 of four disks can be written to cache from 1, 2, and 3 at the same time, I suppose here it is 6 (the actual verification calculation is not like this, I only assume here), and write three data records to the disk at the same time. Regardless of whether the cache exists or not, raid10 of the four disks write two data copies and two mirrors at the same time.
However, as I have mentioned earlier, write operations can be cached and written to the disk after a certain period of time. However, write operations are no better than read operations, this write will happen sooner or later. That is to say, the write that finally falls into the disk cannot be avoided. However, if it is not continuous, as long as it does not reach the write limit of the disk, the difference is not too big.
3. discrete write
This may be the most difficult to understand, but it is also the most important part. For example, most operations in a database such as an Oracle database are discrete writes, such as writing data in a data block each time, such as 8 K; online logs appear to be written consecutively, but because each write volume is small, it is not guaranteed that a strip of RAID 5 can be filled (each disk can be written). Therefore, they are often written in Discrete mode.
Ixdba. Net Community Forum
Next, assume that we want to convert a number 2 to a number 4. For RAID5, there are actually four Io operations,
Read 2 and check 6 first, and a read hit may occur.
Then calculate the new verification in the cache.
Write new number 4 and new verification 8
For raid10, we can see that for a single operation, only two Io is required for raid10, and four Io is required for RAID5.
However, here I ignore RAID5, when there are two read operations, it may also have a read hit operation, that is, if the data to be read is already in the cache, it may not require four I/O operations, but also proves the importance of the cache to RAID5. It is not only required for computing verification, but also important for improving the performance. Once tested, in the RAID 5 array, if write cache is disabled, the performance of RAID 5 will be much worse.
This does not mean that cache is not important to raid 10, because write buffering, read hit, and so on are the key to improving the speed. However, the dependency of RAID 10 on the cache is not as obvious as that of RAID 5.
Here, we should also roughly understand the principles and differences between RAID5 and raid10. Generally, we recommend that you use raid10 for database operations such as small Io, while large file storage and data warehouse, RAID5 can be used from the perspective of space utilization.