A Brief Introduction to Google Protocol Buffer, protocolbuffer
The following content is mainly from the official documents.
- Why use Protocol Buffers
- . Proto File
- Compile the. proto File
- Protocol Buffers API
- Enumeration and Nested classes
- Builders vs. Messages
- Parsing and serialization
- Writing A Message
- Reading A Message
- Extended Protocol
- Encoding
- Compare XML and JSON
- Data size
- Serialization Performance
- Resolution Performance
Why use Protocol Buffers
How many methods are used to serialize and parse structured data?
- Use the Java default serialization mechanism. This method has obvious disadvantages: poor performance and poor cross-language performance.
- Encode the data into a custom string format. Simple and efficient, but only suitable for relatively simple data formats.
- Use XML serialization. This is a common practice with obvious advantages: human readable, highly scalable, and self-describing. However, the XML structure is relatively redundant, and the parsing complexity is not high.
Protocol Buffers
Is a more flexible, efficient, and automated solution. It uses a. proto file to describe the desired data structure. It can automatically generate a Java class that parses the data structure. This class provides an efficient API for reading and writing binary format data. Most importantlyProtocol Buffers
Strong scalability and compatibility. As long as you follow few rules, you can ensure forward and backward compatibility.
. Proto File
package tutorial;option java_package = "com.example.tutorial";option java_outer_classname = "AddressBookProtos";message Person { required string name = 1; required int32 id = 2; optional string email = 3; enum PhoneType { MOBILE = 0; HOME = 1; WORK = 2; } message PhoneNumber { required string number = 1; optional PhoneType type = 2 [default = HOME]; } repeated PhoneNumber phone = 4;}message AddressBook { repeated Person person = 1;}
Protocol Buffers syntax
The syntax of the. proto file is similar to that of Java. message is equivalent to class, and enum is the enumeration type. The basic data types includebool
,int32
,float
,double
, Andstring
, The modifiers before the type are:
- Required Fields
- Optional fields
- Repeated Fields
NOTE 1: For historical reasons, it is best to add [packed = true] To the numeric repeated field to achieve better encoding performance. Repeated int32 samples = 4 [packed = true];
NOTE 2: Protocol Buffers does not support map. If necessary, only two repeated instances can be used instead: keys and values.
1, 2, 3... Is its field number (tag number). Note that this number cannot be changed during later Protocol extension.[default = HOME]
That is, the default value. To avoid naming conflicts, it is best to define one for each. proto file.package
The usage of package is similar to that of Java.import
.
import "myproject/other_protos.proto";
Extension
Although the PB syntax is similar to Java, it does not have an inheritance mechanism.Extensions
This is very different from the original object-orientedJavaBeans
Protocol design.
Extensions
Is what we definemessage
Somefield number
Allow Third Parties to expand.
message Foo { required int32 a = 1; extensions 100 to 199;}
message Bar { optional string name =1; optional Foo foo = 2;} extend Foo { optional int32 bar = 102;}
It can also be nested:
message Bar { extend Foo { optional int32 bar = 102; } optional string name =1; optional Foo foo = 2;}
Set extended fields in Java:
BarProto.Bar.Builder bar = BarProto.Bar.newBuilder();bar.setName("zjd"); FooProto.Foo.Builder foo = FooProto.Foo.newBuilder();foo.setA(1);foo.setExtension(BarProto.Bar.bar,12); bar.setFoo(foo.build());System.out.println(bar.getFoo().getExtension(BarProto.Bar.bar));
I personally think it is very inconvenient to use.
For more information about PB syntax, see the official documentation. PB syntax is relatively simple. Once nested, a very complex data structure can be defined, basically meeting all our needs.
Compile the. proto File
You can compile and download protoc.exe in Windows using a protoprogram of google. The basic usage is as follows:
protoc.exe -I=$SRC_DIR --java_out=$DST_DIR $SRC_DIR/addressbook.proto
In the. proto filejava_package
Andjava_outer_classname
Defines the package name and Class Name of the generated Java class.
Protocol Buffers API
AddressBookProtos.java
Each message in the corresponding. proto file will generate an internal class:AddressBook
AndPerson
. Each class has its own internal class.Builder
Used to create an instance. Messages onlygetter
Read-Only method, builders bothgetter
Methods are also availablesetter
Method.
Person
// required string name = 1;public boolean hasName();public String getName();// required int32 id = 2;public boolean hasId();public int getId();// optional string email = 3;public boolean hasEmail();public String getEmail();// repeated .tutorial.Person.PhoneNumber phone = 4;public List<PhoneNumber> getPhoneList();public int getPhoneCount();public PhoneNumber getPhone(int index);
Person. Builder
// required string name = 1;public boolean hasName();public java.lang.String getName();public Builder setName(String value);public Builder clearName();// required int32 id = 2;public boolean hasId();public int getId();public Builder setId(int value);public Builder clearId();// optional string email = 3;public boolean hasEmail();public String getEmail();public Builder setEmail(String value);public Builder clearEmail();// repeated .tutorial.Person.PhoneNumber phone = 4;public List<PhoneNumber> getPhoneList();public int getPhoneCount();public PhoneNumber getPhone(int index);public Builder setPhone(int index, PhoneNumber value);public Builder addPhone(PhoneNumber value);public Builder addAllPhone(Iterable<PhoneNumber> value);public Builder clearPhone();
In addition to the getter-setter method in the JavaBeans style, some other getter-setter methods will be generated:
- Has _ non-repeated fields all have such a method to determine whether the field value is set or the default value is used.
- Clear _ each field has a clear method to clear the field value as null.
- _ Count returns the number of repeated fields.
- AddAll _ assigns a set of values to the repeated field.
- The repeated field can also be set and read Based on the index.
Enumeration and Nested classes
Message nested message generates Nested classes, and enum generates enumeration types without Java 5.
public static enum PhoneType { MOBILE(0, 0), HOME(1, 1), WORK(2, 2), ; ...}
Builders vs. Messages
All classes generated by messages are immutable like Java strings. To instantiate a message, you must first create a builder. You can only modify the message class by using the setter method of the builder class. Each setter method returns the builder itself, so that all fields can be set in a line of code:
Person john = Person.newBuilder() .setId(1234) .setName("John Doe") .setEmail("jdoe@example.com") .addPhone( Person.PhoneNumber.newBuilder() .setNumber("555-4321") .setType(Person.PhoneType.HOME)) .build();
Each message and builder provides the following methods:
- IsInitialized (): Check whether all required fields have been set;
- ToString (): returns a human-readable string, which is useful in debugging;
- MergeFrom (Message other): Only the builder has this method. If another message object is merged, the non-repeated field will overwrite it, and the repeated field will merge the two sets.
- Clear (): Only builder has this method. clear all fields and return to the null state.
Parsing and serialization
Each message has the following methods to read and write protocol buffer in binary format. For the binary format, see here (FQ may be required ).
- Byte [] toByteArray (); serialize the message to byte [].
- Static Person parseFrom (byte [] data); parses the message from byte.
- Void writeTo (OutputStream output); serialize the message and write it to OutputStream.
- Static Person parseFrom (InputStream input); read from InputStream and parse the message.
EachProtocol buffer
Class provides some basic operations on binary data, which is not very good in Object-Oriented operations. If you need more operations or cannot modify them. in the case of a proto file, it is recommended to encapsulate a layer based on the generated class.
Writing A Messageimport com. example. tutorial. addressBookProtos. addressBook; import com. example. tutorial. addressBookProtos. person; import java. io. bufferedReader; import java. io. fileInputStream; import java. io. fileNotFoundException; import java. io. fileOutputStream; import java. io. inputStreamReader; import java. io. IOException; import java. io. printStream; class AddPerson {// This function fills in a Person mes Sage based on user input. static Person PromptForAddress (BufferedReader stdin, PrintStream stdout) throws IOException {Person. builder person = Person. newBuilder (); stdout. print ("Enter person ID:"); person. setId (Integer. valueOf (stdin. readLine (); stdout. print ("Enter name:"); person. setName (stdin. readLine (); stdout. print ("Enter email address (blank for none):"); String email = stdin. readLine (); If (email. length ()> 0) {person. setEmail (email);} while (true) {stdout. print ("Enter a phone number (or leave blank to finish):"); String number = stdin. readLine (); if (number. length () = 0) {break;} Person. phoneNumber. builder phoneNumber = Person. phoneNumber. newBuilder (). setNumber (number); stdout. print ("Is this a mobile, home, or work phone? "); String type = stdin. readLine (); if (type. equals ("mobile") {phoneNumber. setType (Person. phoneType. MOBILE);} else if (type. equals ("home") {phoneNumber. setType (Person. phoneType. HOME);} else if (type. equals ("work") {phoneNumber. setType (Person. phoneType. WORK);} else {stdout. println ("Unknown phone type. using default. ");} person. addPhone (phoneNumber);} return person. build ();} // Main fun Ction: Reads the entire address book from a file, // adds one person based on user input, then writes it back out to the same // file. public static void main (String [] args) throws Exception {if (args. length! = 1) {System. err. println ("Usage: AddPerson ADDRESS_BOOK_FILE"); System. exit (-1);} AddressBook. builder addressBook = AddressBook. newBuilder (); // Read the existing address book. try {addressBook. mergeFrom (new FileInputStream (args [0]);} catch (FileNotFoundException e) {System. out. println (args [0] + ": File not found. creating a new file. ");} // Add an address. addressBook. addPerson (PromptForAddress (new BufferedReader (new InputStreamReader (System. in), System. out); // Write the new address book back to disk. fileOutputStream output = new FileOutputStream (args [0]); addressBook. build (). writeTo (output); output. close ();}}View CodeReading A Messageimport com. example. tutorial. addressBookProtos. addressBook; import com. example. tutorial. addressBookProtos. person; import java. io. fileInputStream; import java. io. IOException; import java. io. printStream; class ListPeople {// Iterates though all people in the AddressBook and prints info about them. static void Print (AddressBook addressBook) {for (Person person: addressBook. getPer SonList () {System. out. println ("Person ID:" + person. getId (); System. out. println ("Name:" + person. getName (); if (person. hasEmail () {System. out. println ("E-mail address:" + person. getEmail ();} for (Person. phoneNumber phoneNumber: person. getPhoneList () {switch (phoneNumber. getType () {case MOBILE: System. out. print ("Mobile phone #:"); break; case HOME: System. out. print ("Home phone #: "); Break; case WORK: System. out. print ("Work phone #:"); break;} System. out. println (phoneNumber. getNumber () ;}}// Main function: Reads the entire address book from a file and prints all // the information inside. public static void main (String [] args) throws Exception {if (args. length! = 1) {System. err. println ("Usage: ListPeople ADDRESS_BOOK_FILE"); System. exit (-1);} // Read the existing address book. addressBook addressBook = AddressBook. parseFrom (new FileInputStream (args [0]); Print (addressBook );}}View Code extension Protocol
In actual use,.proto
Files may need to be extended frequently. Protocol extensions require compatibility consideration,Protocol Buffers
With good scalability, you only need to follow some rules:
- You cannot modify
tag number
;
- Cannot add or delete
required
Field;
- Yes
optional
Andrepeated
Field;
- You can add
optional
Andrepeated
Field, but the newtag number
.
Forward compatibility (the old code processes new messages): the old Code ignores new fields, the deleted option field takes the default value, and the repeated field is a null set.
Backward compatibility (new Code processes old messages): The new code can process old messages transparently, but remember that new fields are not in old messages, therefore, you need to use the has _ method to determine whether to set, or in the new. set a reasonable default value for the new field in proto. For optional fields, if. if the default value is not set in proto, the default value of the type is used. The string is an empty string, the numeric value is 0, and the Boolean value is false.
Note that for the newly added repeated fieldhas_
If it is null, you cannot determine whether the new Code is set or the old code is generated.
We recommend that you set all fields to optional, which provides the strongest scalability.
Encoding
If you are good at English, you can directly read the official documents, but I think this article on the blog is clearer.
In generalProtocol Buffers
The advantage of the encoding is very compact, efficient, occupies a small space, fast resolution, and is very suitable for mobile terminals. The disadvantage is that it does not contain type information and cannot be self-described (some tips can also be used). parsing must depend on.proto
File.
Google calls this encoding format of PBwire-format
.
Compact PB benefits fromVarintThis Variable Length Integer encoding design.
(Image