Getting started with Google protocol Buffers

Source: Internet
Author: User
1. Preface

This introductory tutorial is based on the Java language. This article will:

  1. Create a. proto file in which some PB messages are defined
  2. Use Pb Compiler
  3. Use Pb Java API to read and write data

This article is only an Entry Manual. For more information, see Protocol buffer language guide, Java API reference, Java generated code guide, and encoding reference.

2. Why protocol buffers?

Next we will use the "Address Book" as an example. The application can write and read "Contact" information. Each contact consists of name, ID, email address, and contact photo number. The information is stored in the file.

How to serialize and retrieve such structured data? There are the following solutions:

  1. Use Java serialization ). This is the most direct solution, because it is built into the Java language, but there are many problems with this method (objective Java has a detailed introduction to this ), this method does not work when other applications (such as C ++ and Python applications) share data with them.
  2. Encodes a data item into a special string. For example, encode four integers into "12: 3:-23: 67 ". This method is simple and flexible, but it requires writing an independent code. It only requires one encoding and decoding, And the parsing process requires some operation costs. This method is very effective for simple data structures.
  3. Serialize data to XML. This method is very attractive because it is easy to read (to some extent) and has multiple parsing libraries in different languages. This is an effective way to share data with other applications or projects. However, XML is a name-consuming space, which may cause high performance loss in encoding and decoding. Furthermore, the number of xml dom operations is very complex, which is far less simple than the field in the operation class.

The Protocol buffers can be flexible, efficient, and automated to solve this problem. You only need:

  1. Create a. proto file to describe the desired data storage structure
  2. Creates a Class Using PB compiler, which can be efficient and automatically encodes and parses Pb data in binary mode.

This generation class provides getter and setter methods to form Pb data fields, and even considers how to efficiently read and write Pb data. What's more, Pb-friendly support for field expansion and the expanded code can still correctly read the data encoded in the original format.

3. Define the Protocol format

First, create a. proto file. It is very simple. Each data structure to be serialized encodes a PB message and specifies a name and type for the field in the message. The. proto file addressbook. proto of the address book is defined as follows:

?
1234567891011121314151617181920212223242526 package tutorial;   option java_package = "com.example.tutorial"; option java_outer_classname = "AddressBookProtos";   message Person {   required string name = 1;   required int32 id = 2;   optional string email = 3;     enum PhoneType {     MOBILE = 0;     HOME = 1;     WORK = 2;   }     message PhoneNumber {     required string number = 1;     optional PhoneType type = 2 [default = HOME];   }     repeated PhoneNumber phone = 4; } message AddressBook {   repeated Person person = 1; }

As you can see, the syntax is very similar to Java or C ++. Next, let's get the meaning of each sentence one by one:

  • The. proto file starts with a package declaration. This statement helps avoid naming conflicts between different projects. Java Pb. If java_package is not specified, the default package of the generated class is the package. The java_package of the lifecycle, so the final generated class will be located under com. example. Tutorial package. We recommend that you define the package of the. proto file even if java_package is specified.
  • After the package is declared, two options are specifically specified for Java: java_package and java_outer_classname. We have already said java_package and will not repeat it here. Java_outer_classname is the name of the generated class, which contains all the classes defined in. Proto. If this option is not explicitly specified, the. proto file name will be used as the class name according to the hump rule. For example, "addressbook. proto" will be "addressbook", and "address_book.proto" will be "addressbook"
  • After the option is specified in Java, the message is defined. Each message is a set of fields that specify the type. The field types here include most of the standard simple data types, including bool, int32, float, double, and string. You can also define nested messages in message. For example, "person" message contains "phonenumber" message. You can also use the defined message as the new data type. For example, in the above example, the phonenumber type is defined in person, but it is the phone type. When a field contains a pre-defined list, you can also define the enumeration type, for example, "phonetype ".
  • We noticed that each field in the message has a tag such as "= 1" and "= 2". This is not an initialization assignment. The value is in the message, the unique identifier of the field, which is used in binary encoding. Number 1 ~ The Value 15 requires less than one byte. Therefore, you can use 1 ~ 15. Repeated elements ). Use 16 or more numbers to mark optional elements that are not commonly used. In repeated fields, each element must recode the tag number. Therefore, this optimization is best for repeated fields (Repeat fileds ).

Each field of message must be declared with the following three modifiers:

  1. Required: The value must be assigned and cannot be blank. Otherwise, the message will be considered as "uninitialized ". Building a "uninitialized" message will throw a runtimeexception. parsing a "uninitialized" message will throw an ioexception. In addition, the "required" field is not different from the "optional" field.
  2. Optional: a field can be assigned a value or not. If no value is assigned, the default value is assigned. For simple type, you can set the default value, for example, the phonetype field in the phonenumber in the preceding example. If this parameter is not set, a system default value is assigned. The numeric type is 0, the string type is null, And the bool type is false. For a built-in message, the default value is the default instance or prototype of the message, that is, all fields in the message are set. When you obtain the value of the optional field without explicitly set values, the default value of this field is returned.
  3. Repeated: this field can be repeated for any number of times, including 0. The order of repeated data will be stored in Protocol buffer. You can think of this field as an array that can automatically set the size.

 Notice: Be careful when defining the required field. If you want to change the required field to the optional field for some reason, the reader of the old version will think that this field is not complete in the message, this field may be rejected or discarded (this is the case in the Google document, but I tried to change required to optional and read it with the parsing code of the original required, if a field is assigned a value, no error occurs. If the field is not assigned a value, the following error occurs: exception in thread "Main" com. google. protobuf. invalidprotocolbufferexception: Message missing required fields: fieldname ). During design, try to put this verification on the application end. Some Google engineers are also confused about this. They think that the disadvantage of the required type is greater than that of the advantage. They should try to apply only optional or repeated. But not everyone thinks so.

If you want to learn more about writing a. proto file, refer to the Protocol buffer language guide. But do not think that there will be a mechanism similar to class inheritance. Protocol buffers will not do this...

4. Compile protocol Buffers

After the. proto file is defined, run the Pb compiler protoc to compile the. proto file and generate related classes. You can use these classes to read and write "Address Book" without message. Next we will do:

  1. If you have not installed the Pb compiler, install: Download the package
  2. After the installation, run protoc. After the installation, the addressbookprotos. Java file is generated under the project com. example. Tutorial package:
?
123 protoc -I=$SRC_DIR --java_out=$DST_DIR $SRC_DIR/addressbook.proto #for example protoc -I=G:\workspace\protobuf\message --java_out=G:\workspace\protobuf\src\main\java G:\workspace\protobuf\messages\addressbook.proto
  • -I: Specifies the source code position of the application. If no value is assigned, the current path is available (to be honest, I am a literal translator and do not understand what it means. I tried it. The value cannot be blank. If it is null, an empty folder is assigned. If it is the current path, use it. instead, I use. instead, the system prompts "wrong. But it can be any path that runs correctly, as long as it is not empty );
  • -- Java_out: Specifies the target path, that is, the generated code output path. This is based on Java, so it is -- java_out. Set it to a relative language relative to other languages.
  • The last parameter is the. proto file.

Notice: After running the command, check the generated code. Some errors such as com. Google cannot be resolved to a type may occur. This is because the corresponding library of protocol buffers is missing in the project. In the Protocol buffers source code package, you will find Java/src/main/Java. Copying the files below to your project may solve the problem. I can only give it a rough idea, because I was just learning at the time, and all kinds of errors were quite disgusting. There is a simple method. Create a Maven Java project. In Pom. XML, add the dependency of protocol buffers to solve all problems ~ Add the following dependency to Pom. XML (note the version ):

<dependency><groupId>com.google.protobuf</groupId><artifactId>protobuf-java</artifactId><version>2.5.0</version></dependency>
5. Classes and methods generated by Protocol buffer Java api5.1

Next, let's take a look at the classes and methods created by the Pb compiler. First, we will find a. Java file that defines an addressbookprotos class internally, that is, the class we specified in the javassbook. proto file java_outer_classname. This class contains a series of internal classes, which correspond to the messages we defined in addressbook. Proto. Each class has a corresponding builder class. We can use it to create class instances. The generated class and the builder class inside the class automatically generate methods for obtaining fields in the message. The difference is that the generated class only has the getter method, the internal builder of the generation class includes both the getter method and the setter method. In this example, the person class has only the getter method ,:

However, the person. Builder class includes both the getter method and the setter method,

Person. Builder

You can see from the two pictures above:

  1. Each field has getter and setter in the JavaBean style.
  2. For each simple type variable, there is also a method like has. If this field is assigned a value, true is returned. Otherwise, false is returned.
  3. Each variable has a clear method for setting null fields.

For repeated fields:

Repeated filed

As shown in the figure:

  1. From the person. Builder diagram, we can see that there is a special getter for the repeated field, that is, the getphonecount method, and the repeated field also has a special count method.
  2. The getter and setter methods obtain or set a data item based on index.
  3. The add () method is used to append a data item.
  4. Addall () method to directly add all data items in a container

Note that all these methods are named in accordance with the hump rules, even if they are in lowercase in the. proto file. The methods and fields generated by PB compiler are generated according to the hump rules to comply with the basic java specifications. Of course, the same applies to other languages. Therefore, in the proto file, it is best to use "_" to separate different lowercase words.

5.2 enumeration and Nested classes

The Code also generates an enumeration: phonetype, which is located inside the person class:

 public enum PhoneType        implements com.google.protobuf.ProtocolMessageEnum {      /**       * <code>MOBILE = 0;</code>       */      MOBILE(0, 0),      /**       * <code>HOME = 1;</code>       */      HOME(1, 1),      /**       * <code>WORK = 2;</code>       */      WORK(2, 2),      ;      ...}

In addition, as we expected, there is also a person. phonenumber internal class, nested in the person class, you can take a look at the generated code, no longer paste.

5.3 builders vs. Messages

The message class generated by PB compiler is unchangeable. Once a message object is built, it can no longer be modified, just like the string in Java. Before building a message, you must first construct a builder, then assign values to the required fields using the setter or add () Methods of builder, and then call the build method of the builder object.

In use, we will find that the builder methods for constructing the message object will return a new builder. In fact, this builder is the same method as the builder that calls this method. This is only for convenience. We can write all the setters in a row.

Create a person instance as follows:

Person john = Person.newBuilder().setId(1).setName("john").setEmail("[email protected]").addPhone(PhoneNumber.newBuilder().setNumber("1861xxxxxxx").setType(PhoneType.WORK).build()).build();
5.4 Standard Message Method

Each message class and builder class basically contain some common methods to check and maintain this message, including:

  1. Isinitialized (): checks whether all required fields are assigned values.
  2. Tostring (): returns a message representation that is easy to read (originally binary and unreadable), especially useful in debug.
  3. Mergefrom (message other): Only builder has this method, which combines the content of its message with this message to overwrite simple and repeated fields.
  4. Clear (): Only builder has this method to clear all fields
5.5 parsing and serialization

For each Pb type, the following methods are provided to read and write binary data:

  1. Byte [] tobytearray ();: serialize the message and return a byte array of the original byte type.
  2. Static person parsefrom (byte [] data);: parses the given byte array into message
  3. Void writeto (outputstream output);: writes serialized messages to the output stream.
  4. Static person parsefrom (inputstream input);: Reads and parses the input stream into a message

Only several parsing and serialization methods are listed here. For a complete list, see:MessageAPI reference

6. Use PB to generate class write

Next, use these generated Pb classes to initialize some contacts and write them into a file.

The following program first reads an address book from a file, then adds a new contact, and then writes the new address book back to the file.

package com.example.tutorial;import com.example.tutorial.AddressBookProtos.AddressBook;import com.example.tutorial.AddressBookProtos.Person;import java.io.BufferedReader;import java.io.FileInputStream;import java.io.FileNotFoundException;import java.io.FileOutputStream;import java.io.InputStreamReader;import java.io.IOException;import java.io.PrintStream;class AddPerson {// This function fills in a Person message based on user input.static Person PromptForAddress(BufferedReader stdin, PrintStream stdout)throws IOException {Person.Builder person = Person.newBuilder();stdout.print("Enter person ID: ");person.setId(Integer.valueOf(stdin.readLine()));stdout.print("Enter name: ");person.setName(stdin.readLine());stdout.print("Enter email address (blank for none): ");String email = stdin.readLine();if (email.length() > 0) {person.setEmail(email);}while (true) {stdout.print("Enter a phone number (or leave blank to finish): ");String number = stdin.readLine();if (number.length() == 0) {break;}Person.PhoneNumber.Builder phoneNumber = Person.PhoneNumber.newBuilder().setNumber(number);stdout.print("Is this a mobile, home, or work phone? ");String type = stdin.readLine();if (type.equals("mobile")) {phoneNumber.setType(Person.PhoneType.MOBILE);} else if (type.equals("home")) {phoneNumber.setType(Person.PhoneType.HOME);} else if (type.equals("work")) {phoneNumber.setType(Person.PhoneType.WORK);} else {stdout.println("Unknown phone type.  Using default.");}person.addPhone(phoneNumber);}return person.build();}// Main function: Reads the entire address book from a file,// adds one person based on user input, then writes it back out to the same// file.public static void main(String[] args) throws Exception {if (args.length != 1) {System.err.println("Usage:  AddPerson ADDRESS_BOOK_FILE");System.exit(-1);}AddressBook.Builder addressBook = AddressBook.newBuilder();// Read the existing address book.try {addressBook.mergeFrom(new FileInputStream(args[0]));} catch (FileNotFoundException e) {System.out.println(args[0]+ ": File not found.  Creating a new file.");}// Add an address.addressBook.addPerson(PromptForAddress(new BufferedReader(new InputStreamReader(System.in)), System.out));// Write the new address book back to disk.FileOutputStream output = new FileOutputStream(args[0]);addressBook.build().writeTo(output);output.close();}}
7. Use PB to generate class reading

Run the sixth part of the program and write several contacts to the file. Next, we will read the contacts. Program entry:

package com.example.tutorial;import java.io.FileInputStream;import com.example.tutorial.AddressBookProtos.AddressBook;import com.example.tutorial.AddressBookProtos.Person;class ListPeople {  // Iterates though all people in the AddressBook and prints info about them.  static void Print(AddressBook addressBook) {    for (Person person: addressBook.getPersonList()) {      System.out.println("Person ID: " + person.getId());      System.out.println("  Name: " + person.getName());      if (person.hasEmail()) {        System.out.println("  E-mail address: " + person.getEmail());      }      for (Person.PhoneNumber phoneNumber : person.getPhoneList()) {        switch (phoneNumber.getType()) {          case MOBILE:            System.out.print("  Mobile phone #: ");            break;          case HOME:            System.out.print("  Home phone #: ");            break;          case WORK:            System.out.print("  Work phone #: ");            break;        }        System.out.println(phoneNumber.getNumber());      }    }  }  // Main function:  Reads the entire address book from a file and prints all  //   the information inside.  public static void main(String[] args) throws Exception {    if (args.length != 1) {      System.err.println("Usage:  ListPeople ADDRESS_BOOK_FILE");      System.exit(-1);    }    // Read the existing address book.    AddressBook addressBook =      AddressBook.parseFrom(new FileInputStream(args[0]));    Print(addressBook);  }}

Now we can use the generated class to write and read Pb messages.

8. Expand Pb

When the product is released, we will need to improve our PB definition one day later. If we want to achieve backward compatibility between New Pb and old Pb, we must follow the following rules:

  1. Do not modify the numeric tag behind an existing field
  2. Do not add or delete the required field
  3. You can delete the optional or repeated field.
  4. You can add a new optional or repeated field, but you must use a New Numeric tag (This numeric tag must have never been used in this Pb, including the numeric tag of the deleted field)

If these rules are violated, some exceptions may occur. For more information, see some exceptions. However, these exceptions are rarely used.

Following these rules, the old code can read the new message correctly, but ignore the new fields. The old Code uses their default values for the optional fields that are deleted; the deleted repeated fields are left blank.

The new Code can also read the old messages transparently. However, you must note that the new optional field does not exist in the old message. You must explicitly use the has _ method to determine whether it is set or. the default value is provided in the form of [default = value] In the proto file. If the default value is not specified, values are assigned based on the default type. For the string type, the default value is a null string. For bool, the default value is false. For numeric type, the default value is 0.

9. Advanced usage

The application of Protocol buffers is far more than simple access and serialization. If you want to learn more, you can study Java API reference.

Protocol message class provides an important feature: reflection. You can traverse all the fields of a message and the values of the operation fields without writing any special message type. A very important application of reflection is the ability to convert pbmessage to other encoding languages, such as XML or JSON.

Reflection another more advanced application should be different between two messages of the same type, or develop an application that can become a "Protocol buffers regular expression", use it, you can write expressions that match certain message content.

In addition, you may find that Protocol buffers can solve the problem far beyond your expectation.

Translation: https://developers.google.com/protocol-buffers/docs/javatutorial

To tell the truth, the entire article has been translated very hard and requires coding to test whether the article can pass. Therefore, if you want to reprint it, you are very welcome, but please indicate the source, it is also a respect for your hard work ~

Original works can be reprinted. During reprinting, you must mark the original source, author information, and This statement. Otherwise, legal liability will be held. Http://shitouer.cn/2013/04/google-protocol-buffers-tutorial/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.