Application and Analysis of Protocol Buffers
1 Introduction to Protocol Buffers
Protocol Buffers is a mechanism used to serialize structured data. It is flexible, efficient, and automated. Similar to XML, but smaller, faster, and simpler than XML. In Google, almost all of its internal RPC protocols and file formats use PB.
PB has the following features:
- Platform-independent and language-independent
- High Performance 20-times higher than XML Block
- Small Size 3-10 times smaller than XML
- Easy to use
- Good compatibility
Here, I did a small experiment to convert a custom text data of 29230KB into PB and XML:
|
PB |
XML |
Converted size |
21011KB |
43202KB |
Resolution time (100 cycles) |
18610 ms |
169251 ms |
Number of lines of code written for parsing |
1 line |
50 rows |
The difference from the official saying may be mainly due to the fact that the field in my test data is relatively long due to different application scenarios. |
Table 1: experiment comparison of PB and XML
It can be seen that PB, As a lightweight data protocol, has certain advantages in time and space.
2 simple application of Protocol Buffers 2.1 creation process 2.1.1 define a. proto File
Create a new file named addressbook. proto with the following content:
package
tutorial;
// Namespace
option java_package =
"com.example.tutorial"
;
// Package name of the generated file
option java_outer_classname =
"AddressBookProtos"
;
// Class Name
message Person {
// Structured data to be described
required string name =
1
;
// Required indicates that this field cannot be blank
required int32 id =
2
;
// The content after the equal sign is a digital alias
optional string email =
3
;
// Optional indicates that it can be empty.
PhoneNumber {
// Internal message
required string number =
1
;
optional int32 type =
2
;
}
repeated PhoneNumber phone =
4
}
message AddressBook {
repeated Person person =
1
;
// A collection
}
Some explanations of the above content:
- For the metadata supported by PB, see PB metadata.
- Modifier required: This modifier should be used with caution. misuse may cause compatibility problems in subsequent modifications;
- Modifier optional: For frequently-seen attributes, the 1-16 alias should be used to save space;
- PB serializes structured data in the form of key-value. It uses varints to encode the digital alias and attribute type after the equal sign into a number as the key.
2.1.2 use PB Compiler
Input: protoc-I = $ SRC_DIR-java_out = $ DST_DIR $ SRC_DIR/addressbook. proto
Where-I specifies the directory where the. proto file is located
-Java_out specifies the directory where the java file is generated
2.1.3 use PB APIs to write and read messages
After the preceding steps, a AddressBookProtos. java class is generated under the specified $ DST_DIR directory. After protobuf-java dependency is introduced in maven, data can be serialized/deserialized using this class.
The generated code structure is as follows:
class
AddressBookProtos{
class
Person{
class
PhoneNumber{
class
Builder{} }
class
Builder{}
}
class
AddressBook{
class
Builder{} }
}
We can see that the internal classes of Person, PhoneNumber, and AddressBook correspond to the defined messages.
2.2 serialization Data and Analysis
By reading the code, we can see that the member variables of the above three classes are of the private type, and only the getter method is provided, but the setter method is not provided to assign values to the data variables.
PB utilizes the characteristics that internal classes can access private member variables in external classes. Any assignment operation on the External Department class must be performed through the internal class Builder. Builder has a reference (named result) pointing to an external class. When the value assignment is complete and the Builder build () method is called, this object is returned and the result points to null.
PB ensures data security in this way. Once the data is built, it cannot be modified.
For the PhoneNumber class, assign values to the member variables number and type as follows:
PhoneNumber.Builder builder = PhoneNumber.newBuilder();
// Call setter to assign values. setter returns this, so it can be chained.
builder.setNumber(
"111"
).setType(
1
);
// After the value assignment is complete, call the build method of Builder to return the PhoneNumber object.
PhoneNumber phoneNumber = builder.build();
After building, you can call the writeTo method to write data into the data stream.
2.3 deserialization and Analysis
One line of code can complete deserialization:
AddressBook list = AddressBook. parseFrom (inputStream or buffer );
PB has done many things:
- Construct a CodedInputStream Based on inputStream or buffer;
- Then use the mergeFrom method in the generated code to parse the binary data:
Call the readTag of CodedInputStream, that is, obtain the key value (int type) from it, and then assign values to and from the swtich block (PB uses the Base 128 Varints method to encode this number, this method will be introduced later ).
- After the data is parsed, the build () method is called to return the constructed object.
For more details, please continue to read the highlights on the next page: