Simple analysis of RLP coding principle

Source: Internet
Author: User
Tags 0xc0 data structures serialization
RLP coding is the main method of data serialization in the ether square. This paper introduces the main rules and principle analysis of RLP coding, RLP coding has good data processing efficiency, especially the length and type unification as the prefix, in fact RLP is a kind of structural expansion based on ASCII coding, which can represent the length and the type is a very compact and structured coding scheme.


RLP (recursive length Prefix, recursive length prefix) is a coding algorithm for encoding arbitrary nested structure of binary data, which is the main method of data serialization/deserialization in Ethernet, block, Data structures such as transactions are RLP encoded and then stored in the database when they are persisted.


Defined


The definition of RLP encoding deals with only two types of data: one is a string (for example, a byte array) and the other is a list. A string refers to a string of binary data, and a list is a nested recursive structure, the inside can contain strings and lists, such as ["Cat", ["Puppy", "cow"], "horse", [[]], "pig", [""], "sheep"] is a complex list. Other types of data need to be converted into the above two classes, the rules of conversion are not defined by RLP encoding, can be converted according to their own rules, such as struct can be turned into a list, int can be converted into binary (belong to a string), the whole number in the etheric square is stored in big.


From the RLP encoded name can see its characteristics: one is recursion, the encoded data is a recursive structure, and the encoding algorithm is recursively processed; the second is the length prefix, which is the prefix of the RLP encoding, which is related to the length of the encoded data, as can be seen from the coding rules below.


Coding Rules

rule One , for a single byte, if its value range is [0x00, 0x7f], its RLP encoding is itself. It's important to note that this boundary is 0x7f because the maximum ASCII encoding is 0x7f, which means that it is used entirely as an ASCII encoding within 0x7f


Rule Two , if the length of a string is 0-55 bytes, its RLP encoding contains a single-byte prefix followed by the string itself, and the value of the prefix is the length of the 0x80 plus string. Since the maximum length of the encoded string is 55=0x37, the maximum value of a single-byte prefix is 0x80+0x37=0xb7, that is, the first byte of the encoding is [0x80, 0xb7].


Rule Three , if the length of a string is greater than 55 bytes, its RLP encoding contains a single-byte prefix followed by the length of the string, followed by the string itself. The value of this prefix is the byte length in binary form of 0xb7 plus string length, a bit of a detour, for example, the length of a string is 1024, its binary form is 10000000000, the binary form of the length is 2 bytes, so the prefix should be 0xb7+2 =0XB9, the string length is 1024=0x400, so the entire RLP encoding should be \xb9\x04\x00 followed by the string itself. The first byte of the encoding is the value range of the prefix is [0xb8, 0XBF] because the string length binary form is at least 1 bytes, so the minimum value is 0xb7+1=0xb8, and the string length binary is the maximum of 8 bytes, so the maximum value is 0XB7+8=0XBF.


Rule Four , if the total length of a list (the total length of the list refers to the number of items it contains and the sum of the lengths it contains) is 0-55 bytes, its RLP encoding contains a single byte prefix followed by the RLP encoding of each element item in the list. The value of this prefix is the total length of the 0xc0 plus list. The value range of the first byte encoded is [0XC0, 0xf7].


Rule Five , if the total length of a list is greater than 55 bytes, its RLP encoding contains a single byte prefix followed by the length of the list, followed by the RLP encoding of each element item in the list, the value of which is the byte length of the 0xf7 plus the binary form of the total list length. The first byte of the encoding is scoped to [0xf8, 0xFF].


RLP Coding Example

String "Dog" = [0x83, ' d ', ' o ', ' g '] (Rule II)


list ["Cat", "dog"] = [0xc8, 0x83, ' C ', ' a ', ' t ', 0x83, ' d ', ' o ', ' g '] (rule IV)


Empty string "" = 0x80 (rule two)


Empty list [] = [0xc0] (rule four)


Integer (' \x0f ') = 0x0f (rule i)


Integer 1024 (' \x04\00 ') = [0x82, 0x04, 0x00] (rule two)


list [[], [[]], [[], [[]]] = [0xc7, 0xc0, 0xc1, 0xc0, 0xc3, 0xc0, 0xc1, 0xc0] (rule IV)


String "Lorem ipsum dolor sit amet, consectetur adipisicing elit" = [0xb8, 0x38, ' L ', ' o ', ' r ', ' E ', ' m ', ', ', ..., ' e ', ' L ' , ' I ', ' t '] (rule three)


RLP Analysis


Above we can see the design idea of RLP coding, it is through the first byte to quickly judge a string of types of encoding, make full use of a byte of storage space, the value of 0x7f later given a new meaning, the past we see the encoding method is mainly to specify the length of bytes encoding, such as Unicode, When processing these encodings, they are usually split and decoded according to the specified length. The biggest disadvantage is that the traditional code can not represent a structure, is the list of this article, RLP the biggest advantage is that in the full use of bytes, while supporting the list structure, that is, can easily use RLP store a tree structure.


The program is also very easy to process RLP encoding, according to the first byte can be judged by the type of encoding, while calling different methods for decoding, if you are familiar with the structure of Jason, you will find that RLP is very similar to support the nesting structure, through recursive call to the entire RLP can be quickly reduced to a tree, Or translates into a Jason structure that is easy to use for other programs.


RLP uses the number of digits of the first byte storage length, then using subsequent bytes to indicate the length of the overall string, according to rule two, RLP can support a single maximum string length of 2 64 times, this is an astronomical, plus nested rules, so theoretically RLP can encode any data.


Transfer from block net

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.