? Application and Analysis of protocol Buffers

Source: Internet
Author: User
Tags deprecated

1 Introduction to Protocol Buffers

Protocol buffers is a mechanism used to serialize structured data. It is flexible, efficient, and automated. Similar to XML, but smaller, faster, and simpler than XML. In Google, almost all of its internal RPC protocols and file formats use Pb.
PB has the following features:

  1. Platform-independent and language-independent
  2. High Performance 20-times higher than XML Block
  3. Small Size 3-10 times smaller than XML
  4. Easy to use
  5. Good compatibility

Here, I did a small experiment to convert a custom text data of 29230kb into Pb and XML:

PB XML
Converted size 21011kb 43202kb
Resolution time (100 cycles) 18610 Ms 169251 Ms
Number of lines of code written for parsing 1 line 50 rows

The difference from the official saying may be mainly due to the fact that the field in my test data is relatively long due to different application scenarios.

Table 1: experiment comparison of Pb and XML

It can be seen that Pb, As a lightweight data protocol, has certain advantages in time and space.

2 simple application of Protocol buffers 2.1 creation process 2.1.1 define a. proto File

Create a new file named addressbook. proto with the following content:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 20 21 22 package tutorial; // Namespaceoption java_package = "com.example.tutorial" ; // Package name of the generated file option java_outer_classname = "AddressBookProtos" ; // Class Namemessage Person { // Structured data to be described     required string name = 1 ; // Required indicates that this field cannot be blank      required int32 id = 2 ; // The content after the equal sign is a digital alias      optional string email = 3 ; // Optional indicates that it can be empty.     PhoneNumber { // Internal message          required string number = 1 ;          optional int32 type = 2 ;      }     repeated PhoneNumber phone = 4 }message AddressBook {      repeated Person person = 1 ; // A collection }

Some explanations of the above content:

  • For the metadata supported by Pb, see Pb metadata.
  • Modifier required: This modifier should be used with caution. misuse may cause compatibility problems in subsequent modifications;
  • Modifier Optional: For frequently-seen attributes, the 1-16 alias should be used to save space;
  • PB serializes structured data in the form of key-value. It uses varints to encode the digital alias and attribute type after the equal sign into a number as the key.
2.1.2 use Pb Compiler

Input: protoc-I = $ src_dir-java_out = $ dst_dir $ src_dir/addressbook. proto
Where-I specifies the directory where the. proto file is located
-Java_out specifies the directory where the Java file is generated

2.1.3 use Pb APIs to write and read messages

After the preceding steps, a addressbookprotos. Java class is generated under the specified $ dst_dir directory. After protobuf-Java dependency is introduced in Maven, data can be serialized/deserialized using this class.
The generated code structure is as follows:

1 2 3 4 5 6 7 class AddressBookProtos{      class Person{          class PhoneNumber{ class Builder{} }          class Builder{}      }      class AddressBook{ class Builder{} } }

We can see that the internal classes of person, phonenumber, and addressbook correspond to the defined messages.

2.2 serialization Data and Analysis

By reading the code, we can see that the member variables of the above three classes are of the private type, and only the getter method is provided, but the setter method is not provided to assign values to the data variables.
PB utilizes the characteristics that internal classes can access private member variables in external classes. Any assignment operation on the External Department class must be performed through the internal class builder. Builder has a reference (named result) pointing to an external class. When the value assignment is complete and the builder build () method is called, this object is returned and the result points to null.
PB ensures data security in this way. Once the data is built, it cannot be modified.
For the phonenumber class, assign values to the member variables number and type as follows:

1 2 3 4 5 6 7 PhoneNumber.Builder builder = PhoneNumber.newBuilder();// Call setter to assign values. setter returns this, so it can be chained. builder.setNumber( "111" ).setType( 1 );     // After the value assignment is complete, call the build method of builder to return the phonenumber object. PhoneNumber phoneNumber = builder.build();

After building, you can call the writeto method to write data into the data stream.

2.3 deserialization and Analysis

One line of code can complete deserialization:

1 Addressbook list = addressbook. parsefrom (inputstream or buffer );

PB has done many things:

  1. Construct a codedinputstream Based on inputstream or buffer;
  2. Then use the mergefrom method in the generated code to parse the binary data:
    Call the readtag of codedinputstream, that is, obtain the key value (INT type) from it, and then assign values to and from the swtich block (PB uses the base 128 varints method to encode this number, this method will be introduced later ).
  3. After the data is parsed, the build () method is called to return the constructed object.
3 message encoding features

The reason why Pb resolution is fast and small in size is largely determined by its serialized encoding features.

3.1 base 128 varints

PB uses base 128 varints to increase the length of the encoded INTEGER:

  1. A variable-length integer that may contain multiple bytes. For each byte, the last 7 digits indicate the value, and the highest one indicates whether there is another byte, 0 indicates no, and 1 indicates yes;
  2. The greater the value, the lower the value, and the higher the value;

Example:
300 varints encoding: 1010 1100 0000 0010
Explanation:
300 binary encoding: 0001 0010 1100
According to the preceding rule, the top and low bits are reversed. If the last 7 digits are placed in the first byte, the first byte is 1010 1100 (the highest bits 1 indicates that there are still byte values in the future ); then the remaining content is placed in the second byte, Which is 0000 0010 (the highest bit 0 indicates that there is no byte in the future, and the count ends here ).
Therefore, the combination is 1010 1100 0000 0010;

3.2 key-Value

As mentioned above, Pb messages are a series of key-value pairs. In binary data, varints numbers (including aliases and attribute type information) are used as keys, then, the data is constructed and parsed using the code generated by the Pb compiler.
PB encodes the key into the following structure:
X yyyy zzz
Where: The highest bit X indicates whether there are subsequent bytes to encode the digital alias; YYYY is used to encode the alias, and 16 extra attributes are defined, additional bytes are required, therefore, a field with a high frequency should be an alias of 1-16. Zzz indicates the type of this field. The following table lists the rules for attributes supported by Pb:

Type Meaning Used
0 Varint Int32, int64, uint32, uint64, sint32, sint64, bool, Enum
1 64-bit Fixed64, sfixed64, double
2 Length-delimited String, bytes, embedded messages, packed repeated Fields
3 Start Group Groups (Deprecated)
4 End Group Groups (Deprecated)
5 32-bit Fixed32, sfixed32, floa

Table 2: Rules for Pb attributes
Example:
Required int32 A = 1; assign a value of 150 to a in the application, after serialization 08 96 01

  • 08 represents key 0 0001 000, and the highest bit is 0. It indicates that the key is a byte, the four digits in the middle represent the digital alias of A, and the last three digits represent the attribute type of;
  • 96 01 represents value, binary: 1001 0110 0000 0001
    → 001 0110 000 0001 (remove the highest bit)
    → 22 + 1X2 ^ 7 = 150
3.3 zig-zag

Varints is used to encode signed integers, which is less efficient because the maximum bit of a negative number is 1, which leads to a situation similar to encoding a large number.

To solve this problem, protocol buffers defines the sint32/sint64 attribute. They use the "Zigzag" encoding method to encode the negative number into a positive number, alternating. After reading the table below, we can understand it very well:

Signed original Encoded
0 0
-1 1
1 2
-2 3
2147483647 4294967294
2147483648 4294967295

Table 3: zig-zag encoding rules
This method can effectively save storage space and improve resolution efficiency.

After learning about the above content, we also have a good understanding of the encoding of other data types. For details, refer to the official documentation.

4 others

In the official documentation, Pb provides RPC interfaces, but does not provide specific implementations. Add the following definition to the. proto file:

1 2 3 service XXX {      rpc MMM(request) returns(response); }

PB will generate a XXX virtual class that represents this service for you. By implementing the abstract Mmm method in this class and providing rpcchannel implementation, you can use protocol buffers to implement your RPC.

For third-party RPC implementation, refer to thirdpartyrpc

Here, I used a third party to implement protobuf-socket-RPC and wrote a small example. If you are interested, please take a look. The following is an example of protocol buffer RPC.

5 Summary

PB features cross-platform, fast resolution, small volume of serialized data, high scalability, and simple use. However, we can also see that, compared with XML, Pb data is not naturally readable; at the same time, the Code generated by it is not pure pojo, And it is somewhat invasive to the code. In your project, if you do not have high requirements for the above shortcomings, you can try to use Pb.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.