Google Protobuf Serialization principle

Source: Internet
Author: User
Tags deprecated format message

A message that, when serialized, first even if the message all filed serialization needs to occupy the length of bytes, the calculation of this length is very simple, because protobuf in each type of filed occupies a known number of bytes (except bytes, string), You just need to accumulate. This length is serializedsize,32 as an integer, and in some of the serialization methods of PROTOBUF it is possible to use the varint32 (a compressed, based on the number range, using a different byte length int);

After this is the filed list output, each filed output contains byte data for int32 (Tag,type) and value, and we know that each filed has a unique number tag indicating its index position, type of the field If filed is a string, bytes type, a number of varint32 type is added before value, representing the byte length of string, bytes.

When the message is serialized, it becomes a binary data stream in which the data is a series of key-value pairs, such as

The binary format message uses a numeric label as Key,key to identify the specific field, and when unpacking, Protocol Buffer can know according to the Key that the corresponding Value should correspond to which field in the message.

Then at deserialization time, first read an int of 32 to represent Serializedsize, and then read serializedsize bytes in a bytebuffer, that is, read a complete package. Then read a int32 number, from this number to parse out the tag and type, if the type is string, bytes, and then supplement read a varint32 know the length of a string, and thereafter depending on the type or byte length, Reads subsequent byte arrays and converts them to Java type. Repeat the operation until the entire package is resolved.

Using this key-pair structure eliminates the need to use separators to split different Field. For an optional field, if the field does not exist in the message, the field is not available in the resulting message Buffer, and these features help to save the size of the messages themselves.

Above we said, "The binary format of the message using a digital label as Key", where the digital label is not a simple digital label, but a combination of digital label and transport type, according to the transmission type can determine the length of the value.

Definition of key:

(Field_number >> 3) | Wire_type

Key is made up of two parts. The first part is Field_number, the second part is Wire_type. Represents the transport type of Value. That is, the latter three bits in key, is the transport type of the value

The possible types of wire type are shown in the following table:

for
Type meaning used
0 Varint Int32, Int64, UInt32, UInt64, Sint32, Sint64, BOOL, enum
1 64-bit FIXED64, SFIXED64, double
2 Length-delimi String, bytes, embedded messages, packed repeated fields
3 Start Group Groups (deprecated)
4 End Group Groups (deprecated)
5 32-bit FIXED32, SFIXED32, float

Google Protobuf Serialization principle

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.