I. Overview
In previous storage systems, RAID technology was generally used to protect data, and once a hard drive in the array was damaged, the lost data could be recovered by a mirror formed by the raid technology. But with the advent of massive data problems, RAID is becoming more and more difficult to play its part. If you use a 2TB hard disk as a storage medium, a hard drive fails, it takes about 4 hours to restore it using an image, and this is the data recovery time that will be achieved with recovery as the highest priority. But in the actual situation is unlikely to occur, generally is the raid as a lower priority, in the case of idle computing resources in the background, then in this case, the time of the raid reconstruction will continue to be extended, data recovery for up to 1-2 weeks is not surprising. If there is a hard drive failure in the process of rebuilding the data, it means that the data will be lost permanently. Since RAID 5 can allow up to one hard drive in the array to be damaged, RAID 6 allows two hard drives to fail simultaneously, but in the massive data age, it is not impossible to destroy multiple hard disks at the same time, how can the data of one of the important resources of enterprise be secured?
Intel unveiled a new, scalable approach to data protection during the IDF conference-erasure coding. It works by cutting and encoding large chunks of data received by the storage system, then cutting and encoding the cut data again until the data is cut to a satisfactory chunk size, so that the data block is dispersed into multiple chunks, and then the redundancy is verified. Writes non-repeating blocks and encodings to the storage system. It is protected with traditional raid data as shown in:
Erase encoding extends the data protection architecture from raid 5/6 to raid k,k equal to the number of failures that can be described without causing data loss. For RAID 5, k=1; for raid 6, k=2; for erasure coding, k=n, such as in the array of 16 hard disks, using the erase encoding mechanism, even if the 6 hard disk failure, it can also recover the lost data.
As you can see, the erase encoding is currently the main target of large data blocks, Intel also applied it in its recommended large object storage system, after practical testing, the results show that compared with the traditional raid, erasure coding in all aspects have unparalleled advantages, the concrete results as shown:
Erasing the coded application is not only with this, in the technical course on erasure coding, the relevant technical engineer also describes the application of the erasure coding flexibly to multiple data centers, as shown in the following:
In general, the impact of traditional erasure coding technology on performance, especially IOPS and latency is relatively large, so the current scenario is mainly confined to archiving, cloud storage and other cold data;
A new and scalable way to protect data-erase code