Talk more about HDFs Erasure Coding

Last Update:2016-07-17 Source: Internet

Author: User

Tags erasure coding

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective

In one of my previous articles, I had already talked about the HDFs EC aspect (article link Hadoop 3.0 Erasure Coding Erasure code function pre-analysis), so this article is a supplement to its content. In the previous article, the main point of this paper is to explain the HDFS from the macro level. The role of the EC and the corresponding usage scenarios do not go deep into the internal related architecture design and the content of the specific EC algorithm. This article mainly = elaborated content is in these two aspects.

Erasure Coding Technology

EC is the abbreviation of Erasure coding, the Chinese name is to do the erasure code, is a kind of data error correction technology. One of the most common use scenarios for the EC is for signal communication transmission. In the process of data communication, the occurrence of the error is often the case, the EC coding technology is how to do the data automatic correction? Here we have to draw a very wide range of algorithms used in the EC: XOR code.

XOR code

XOR is "XOR", the principle of XOR code is as follows:

数据编码时按照位进行异或运算,数据恢复的时候也就是解码时则通过结果与其他数据位进行异或操作的逆运算.

XOR operation is slightly different from our common "and" operations and "or" operations, followed by the "same as 0, different 1" principle. For example, the following examples (⊙ is the meaning of XOR or operation):

1 ⊕ 1 = 0;0 ⊕ 0 = 0;1 ⊕ 0 = 1;0 ⊕ 1 = 1;0 ⊕ 1 ⊕ 1 = 0;

Now suppose that the second digit in the last equation is that the first digit 1 is lost and becomes the following equation:

0 ⊕ ? ⊕ 1 = 0;

we can recover the data through XOR or operation Inverse, because the final result is 0, so the result of 0⊕ should be 1, that is 0⊕? = 1, because the XOR operation, the difference is 1, so the lost data here is 1, the data is successfully restored. But there's a problem here. If the lost or corrupted data bit exceeds 1 bits, the data seems to be less than good recovery, such as the loss of the first 2 bits:

? ⊕ ? ⊕ 1 = 0;

is the first 2 0,1 or 1,0 at this time? It can only be said. OK, from here we can see that the XOR coding algorithm there are too few tolerable errors , then what other EC algorithm can help us solve this problem? In many cases, there will be multiple data loss scenarios, There is no guarantee that there will be only 1 data errors at a time. The new coding algorithm described below solves this tricky problem.

Reed-solomon Codes

Reed-solomon codes is also one of the EC codes. Reed-solomon codes abbreviation for RS Code, Chinese name Reed Solomon code. Here's how RS code works in HDFs. RS code must specify 2 parameters at the time of Use, RS (K, m), K represents the number of data cell blocks, m represents the number of parity cell blocks, and parity cell can be understood as an encrypted block because it is generated by block encoding. The recovery principle of Rs code is as follows:

数据块如果发生损坏,则可以通过parity cell和其他data cell的解码计算重新恢复.如果加密块发生损坏,则可以通过data cell重新进行编码生成.

The above data block and the encryption block encoding and decoding principle and the matrix operation is somewhat related, the interested classmate may consult the related material to continue to study. Note that the maximum tolerance for errors in the above data blocks is M. But this number can be adjusted, such as RS (6, 3) or RS (10, 4) And so on. The data block and the encrypted block are stored in a slightly different way from the traditional data storage, which is the horizontal stripe storage instead of the traditional continuous storage . Maybe it's a little difficult to understand, it doesn't matter, we continue to look at the following, we will give a detailed explanation later.

Continuous storage of HDFs and striped storage of EC

From this point on, we slowly move the subject to the EC of HDFs. Whether it is a continuous storage method or an EC stripe storage, it is a problem in the layout of block layout. Let's take a closer look at the layout features of these 2 ways.

Continuous storage: Contiguous

Continuous storage is our usual well-known HDFs file storage, in block block units, if the file write data size exceeds 1 blocks, then create a new block to continue to write, know to write the entire data. The logical structure of the entire storage is as follows:

is a standard 128M block-size continuous storage diagram. To further learn about the contents of HDFs continuous storage, you can view the related class blockinfocontiguous, Or click to view one of my other articles, HDFs neighborhood information block blockinfocontiguous.

Striped Storage: Striping

The meaning of stripe here is the meaning of strip. The first time I saw the striped storage in HDFs, my first impression was that the storage logic looked very awkward. Although the block is used as a storage unit, the data is stored horizontally on each block. In other words, the data of different segments on the same block are completely discontinuous. The stripe-like storage structure is shown below:

One of the blocks in each block block is the concept of the data cell when the RS code is described above. The following is a display of striped storage based on the data cell in HDFs, combined with the parity cell cipher block:

the corresponding EC encoding type is RS (6, 3). The data block is stored in a total of 6 nodes from Datanode0~5, and the subsequent datanode6~8 is the encrypted block . We can actually see that HDFS There is a very big difference between the single-band storage mode of EC and the original storage mode. This is bound to bring about a change in the way data is read and written. And how does HDFs EC do that? There is also a problem, the data under HDFs EC can support many of the original HDFs features, Snapshot? Encryptionzone? HDFS Cache?

Architecture design of HDFS EC

So much has been said before, finally, to mention the design of the HDFs EC architecture. I think this is also a lot of people interested in HDFs EC want to know. EC Erasure Code technology as a data protection technology, the existence of a certain learning costs, to bring it into the HDFs, it is definitely not a simple thing . To do a lot of adaptation work. For a more detailed architectural design section, refer to the community on HDFs EC design documentation, link points into it. Here's what I've done after my condensed profile.

Adaptation of data reading and writing related classes

As already mentioned at the end of the previous section, the introduction of the striping stripe storage method leads to a shift in the data read and write logic, so there is a need to introduce a read-write class for the input and output data stream of the HDFs EC-specific stripe. Mainly the following classes:

Through the class name we can also directly see that it is suitable for the type of storage method of data read and write. here is an additional mention of Erasurecodingwork's service, erasurecodingwork and replicationwork similar, The purpose of this is to assign the new copy block task to be replicated to the corresponding Datanode node. The erasurecodingwork corresponds to the assignment of the EC coding task to the corresponding Datanode node .

Architecture design of HDFS EC

The architecture design of HDFS EC also conforms to the master-slave structure, has a central management object (Ecmanager), and then has a corresponding worker object (ecworker). These 2 major role classes have a clear division of labor.

ECManager:EC管理对象会做许多事情,比如协调数据恢复,健康检测,blockGroup的管理等等.ECWorker:做的事情很直接,就是EC数据恢复相关的操作.

So now there's a question, how do I use the previous EC coding algorithm when I do the EC data recovery? HDFs is controlled via EC policy, and each EC strategy corresponds to an EC algorithm for Ecschema parameter configuration. At the same time these EC The policy policies object is owned by the Erasurecodingpolicymanager object. The following 3 EC policies are currently maintained in the Erasurecodingpolicymanager object:

 private  static  Span class= "Hljs-keyword" >final erasurecodingpolicy sys_policy1 = new  Erasu Recodingpolicy (Erasurecodeconstants.rs_6_3_schema, Default_cellsize, hdfsconstants.rs_6_3_policy_id); private  static  final  Erasurecodingpolicy sys_policy2 = new  erasurecodingpolicy ( Erasurecodeconstants.rs_3_2_schema, Default_cellsize, hdfsconstants.rs_3_2_policy_id); private  static  final  Erasurecodingpolicy sys_policy3 = new  erasurecodingpolicy ( Erasurecodeconstants.rs_6_3_legacy_schema, Default_cellsize, hdfsconstants.rs_6_3_legacy_policy_id);

The upper and lower level graphs for the above objects are as follows:

In summary, the overall architecture design in the HDFS EC design document is given at the end:

The use of EC is also very convenient, can be directly through the external command of the data under the specified path of the ec/replication between the flexible loading. You can also set different EC Policy under different paths, and the use of specific EC can also refer to HDFs EC design documentation. Overall, HDFS EC will be a very useful feature.

Reference

1.http://blog.cloudera.com/blog/2015/09/introduction-to-hdfs-erasure-coding-in-apache-hadoop/
2.https://issues.apache.org/jira/secure/attachment/12697210/hdfserasurecodingdesign-20150206.pdf
3. Baidu Encyclopedia. Reed-solomon Codes.

Talk more about HDFs Erasure Coding

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More