Comparison of double codec Algorithms

Source: Internet
Author: User
Overview

The dual CODEC algorithm is a key technology for data protection. The commonly used algorithm in disk arrays is raid 6, which belongs to the Reed-Solomon algorithm. The raid6 algorithm is highly complex and often requires support from hardware acceleration units. With the development of flash memory technology, new encoding and decoding technologies emerge one after another. Therefore, it is necessary to understand the dual encoding and decoding algorithms of commonly used array technologies. Generally, there are three commonly used double encoding and decoding algorithms:

 

1. Two-dimensional (2D) Parity Check Method

2. Even-odd encoding Verification Method

3. Reed-Solomon encoding Verification Method

 

The principles of these three algorithms are described in detail below, and the performance and computing complexity of these three algorithms are compared.

 

Two-dimensional parity Algorithm

The two-dimensional parity method organizes the data disk into an mxn matrix, and performs the parity encoding in both the horizontal and vertical directions. If there are mxn data disks, the number of disks used to store verification information is m + n. Two-dimensional parity is easy to implement using simple exclusive or operations. Shows the principles of the two-dimensional parity algorithm:

 

650) This. width = 650; "Title =" 1.jpg" src = "http://s3.51cto.com/wyfs02/M01/47/DE/wKioL1QBrBWxsNg3AACJDJyn3yI207.jpg" alt = "wkiol1qbrbwxsng3aacjdjyn3yi207.jpg"/>

 

The disk Redundancy Rate in this mode is:

(M + n)/(mxn + m + n)

 

Space Utilization:

(Mxn)/(mxn + m + n)

 

It can be seen that this algorithm is very low in computing complexity, but requires a large number of redundant disks. For example, for an array of 12 (3x4) disks, this algorithm requires seven redundant disks. For the traditional raid 6 algorithm, only two redundant disks are required.

 

Even-odd CODEC algorithm

This encoding method uses horizontal redundancy and diagonal redundancy to design the verification code, as shown in:

 

650) This. width = 650; "Title =" 2.jpg" src = "http://s3.51cto.com/wyfs02/M02/47/DE/wKioL1QBrDLwDj8ZAADKm-K3Mbk687.jpg" alt = "wKioL1QBrDLwDj8ZAADKm-K3Mbk687.jpg"/>

Horizontal Verification

 

650) This. width = 650; "Title =" 3.jpg" src = "http://s3.51cto.com/wyfs02/M00/47/DE/wKioL1QBrEqz9GqhAADPKyZmEuQ383.jpg" alt = "wkiol1qbreqz9gqhaadpkyzmeuq383.jpg"/>

Diagonal Line Verification

 

For M data disks, this encoding method requires two redundant disks. The encoding method adopts the common XOR calculation method. A redundant disk is a normal RAID5 horizontal verification algorithm, and a redundant disk is a diagonal encoding algorithm. This algorithm is used by XDP in EMC's xtremio full-flash array (refer to XDP Technology for xtremio profiling).

 

In traditional disk arrays, the application of this algorithm is limited. Each data update will be converted into a complex and time-consuming operation of "Read-Modify-write, lower-case performance will be very low. However, in an all-flash array, the situation becomes very different, and the problem of lower-case data does not exist. Instead, this algorithm is useful.

 

Reed-Solomon Encoding Algorithm

The Reed-Solomon encoding algorithm is now used by raid 6. Shows the basic principles:

 

650) This. width = 650; "Title =" 4.jpg" src = "http://s3.51cto.com/wyfs02/M02/47/DC/wKiom1QBq1GS99a9AAEpezMn6xM559.jpg" alt = "wkiom1qbq1gs99a9aaepezmn6xm559.jpg"/>

 

The encoding process of this algorithm is described as follows (for specific algorithms, refer to raid6 algorithm parsing):

650) This. width = 650; "style =" border-bottom: # DDD 1px solid; border-left: # DDD 1px solid; Background: URL ("/e/u261/themes/default/images/word.gif") No-repeat center; border-top: # DDD 1px solid; border-Right: # DDD 1px solid; "alt =" * "src ="/e/u261/themes/default/images/spacer.gif "width =" 11 "Height =" 11 "/> P encoding adopts an exclusive or operation get, this step is similar to RAID5.

650) This. width = 650; "style =" border-bottom: # DDD 1px solid; border-left: # DDD 1px solid; Background: URL ("/e/u261/themes/default/images/word.gif") No-repeat center; border-top: # DDD 1px solid; border-Right: # DDD 1px solid; "alt =" * "src ="/e/u261/themes/default/images/spacer.gif "width =" 11 "Height =" 11 "/> q encoding by weighting factor Multiplication the added method is encoded, two equations can be formed with P encoding, and two disks can be invalidated at the same time.

650) This. width = 650; "style =" border-bottom: # DDD 1px solid; border-left: # DDD 1px solid; Background: URL ("/e/u261/themes/default/images/word.gif") No-repeat center; border-top: # DDD 1px solid; border-Right: # DDD 1px solid; "alt =" * "src ="/e/u261/themes/default/images/spacer.gif "width =" 11 "Height =" 11 "/> tape-based Data Storage method

650) This. width = 650; "style =" border-bottom: # DDD 1px solid; border-left: # DDD 1px solid; Background: URL ("/e/u261/themes/default/images/word.gif") No-repeat center; border-top: # DDD 1px solid; border-Right: # DDD 1px solid; "alt =" * "src ="/e/u261/themes/default/images/spacer.gif "width =" 11 "Height =" 11 "/> requires two redundant disks, belongs to the optimal encoding category

 

Performance Comparison of Three encoding/Decoding Algorithms

From the theoretical analysis, we can see that these three encoding/decoding algorithms have different Io performance. 2d encoding is the simplest, followed by even-odd, and RS encoding is the most complex. When the system has a faulty disk, the three algorithms have different data recovery computing complexity. It is the computing complexity of three algorithms when one disk is damaged and two disks are damaged. The red line is the RS algorithm, the Green Line is the EO algorithm, and the white line is the 2D algorithm. (Note that the horizontal axis is not about time, but about the number of disks. The vertical axis is about computing complexity)

 

650) This. width = 650; "Title =" 5.jpg" src = "http://s3.51cto.com/wyfs02/M01/47/DE/wKioL1QBrILAea1ZAAGU2u1P6i4525.jpg" alt = "wkiol1qbrilaea1zaagu2u1p6i4525.jpg"/>

Computing complexity of data recovery when a single disk is damaged (X axis: Number of disks, Y axis: complexity)

 

650) This. width = 650; "Title =" 6.jpg" src = "http://s3.51cto.com/wyfs02/M00/47/DC/wKiom1QBq4ORt-vLAAGPRpRBBjI240.jpg" alt = "wKiom1QBq4ORt-vLAAGPRpRBBjI240.jpg"/>

Computing complexity of data recovery when two disks are damaged (X axis: Number of disks, Y axis: complexity)

We can see that 2D decoding is the simplest among the three double-error codes, followed by even-odd, and RS decoding is the most complex.

 

The comparison results of the three encoding/decoding algorithms in the disk storage array are shown in the following table:

 

650) This. width = 650; "Title =" 7.jpg" src = "http://s3.51cto.com/wyfs02/M01/47/DC/wKiom1QBq6-T-KesAADqtDkfY3A961.jpg" alt = "wKiom1QBq6-T-KesAADqtDkfY3A961.jpg"/>

 

In summary, in the disk array application field, we can draw the following conclusions:

  1. The 2D encoding method is simple and easy to implement, but the cost-effectiveness of the constructed disk array is low (not optimal redundant encoding)

  2. RS encoding and decoding are more complex than event-odd, and generally need to be implemented using hardware circuits. However, RS code performs well in lower case. Therefore, RS Coding is suitable for arrays implemented by hardware/software optimization (Instruction acceleration.

  3. The even-odd encoding and translation process is simpler than the RS code, with only different or operations, but lower-case performance is poor. Therefore, even-odd is suitable for real-time continuous data storage.

 

With the development of flash storage, the characteristics of the storage media have undergone essential changes. Therefore, the optimal CODEC algorithm will also change, previously, the EO algorithm that was not suitable for disk arrays could shine brightly in flash storage.

 

This article is from the "Storage path" blog, please be sure to keep this source http://alanwu.blog.51cto.com/3652632/1546914

Comparison of double codec Algorithms

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.