Google protocol buffers Overview

Source: Internet
Author: User
Tags format definition
Google protocol buffers Overview

My personal website is being organized. Welcome to: http: // shitouer.CN

Website blog address: Google protocol buffers Overview

Recommended reading order, hope to bring you some benefits ~

Google protocol buffers Overview

Google protocol buffers getting started

Protocol buffers syntax Guide

Google protocol buffers encoding (encoding)

1. Overview

Protocol buffers is a lightweight and efficient structured data storage format that can be used for serialization or serialization of structured data. It is suitable for data storage or RPC data exchange formats. It can be used for language-independent, platform-independent, and scalable serialized structure data formats in communication protocols, data storage, and other fields. Currently, APIs in C ++, Java, and Python are provided.

This article provides an overview of protocol buffers and how to get started with protocol buffers. This series mainly focuses on Java (although it is not enough to learn Python ...).

The following protocol buffers is referred to as Pb.

2. What is protocol buffers?

Protocol buffers provides a flexible, efficient, and automatic serialization mechanism for structured data, which can be associated with XML, but is smaller, faster, and simpler than XML. You only need to customize the data format you need once. Then you can use the specific source code automatically generated by Protocol buffers to conveniently read and write custom formatted data. Unlimited language and platform. You can also update the existing data format based on the old data format without disrupting the original data format.

3. How does protocol buffers work?

In Pb, There Is A. proto type file. You define Pb "message" in the. proto file to specify the format of the data to be serialized. Each Pb message is a small logical unit of information, including the name-value pairs of some columns. The following is an example of a simple. proto file that defines a message containing the person information:

?
123456789101112131415161718 message Person {   required string name = 1;   required int32 id = 2;   optional string email = 3;     enum PhoneType {     MOBILE = 0;     HOME = 1;     WORK = 2;   }     message PhoneNumber {     required string number = 1;     optional PhoneType type = 2 [default = HOME];   }     repeated PhoneNumber phone = 4; }

As shown in the code above, the Pb Message format is very simple. Each type of message contains one or more unique encoding fields. Each field is composed of the name and value type. The value type can contain numbers (integer or floating point type), Boolean values, and character strings, original bytes, or even other Pb messages. Petabytes allow messages to contain messages, which have reached hierarchical nesting. You can define optional fields, required fields, and repeated fields. To learn more about how to write the. proto file, visit: Protocol buffer language guide.

After the Pb message is defined, select the appropriate language's PB compiler and compile the. proto file to generate related classes for data access. These classes include simple settings and methods for reading fields, as well as conversion between the message and binary of the entire data structure. For example, if you are using Java, after you run the compiler to compile the example. proto file, the generated class contains a person class. With this class, you can compute, serialize, and retrieve Pb messages. The following code:

?
1234567891011121314151617 public static void main(String[] args) throws IOException {     Person john = Person             .newBuilder()             .setId(1)             .setName("john")             .setEmail("[email protected]")             .addPhone(                 PhoneNumber                     .newBuilder()                     .setNumber("1861xxxxxxx")                     .setType(PhoneType.WORK)                     .build())             .build();     FileOutputStream output = new FileOutputStream("abc.txt");     john.writeTo(output);     output.close(); }

Next, you can use the following code to read:

?
12345678910 public static void main(String[] args) throws IOException {     FileInputStream input = new FileInputStream("abc.txt");     Person person = Person.parseFrom(input);     System.out.println(person.getId());     System.out.println(person.getName());     System.out.println(person.getEmail());     System.out.println(person.getPhoneCount());     System.out.println(person.getPhone(0).getNumber());     System.out.println(person.getPhone(0).getType()); }

PB is easy to expand and backward compatible. We can add new fields to the Pb message, so that in parse, the data of earlier versions will simply ignore the newly added fields. Therefore, if the existing communication protocol uses Pb as its data format, we can directly extend the communication protocol without worrying that this will damage the existing code.

For the use of the. proto file to generate Pb client code, you can refer to the complete tutorial in this regard: api reference section. To learn how Pb message is encoded, see Protocol buffer encoding.

4. Why not use XML directly?

To serialize structured data, PB has many advantages over XML ~

  1. Simpler
  2. 3 ~ smaller than XML ~ 10 times
  3. 20 ~ Faster than XML ~ 100 times
  4. Clear Semantics
  5. Automatically generate data access classes, making it easier to use

Assume that we want to simulate a person. The object contains the name and email attributes. If XML is used, we define it as follows:

<person>    <name>John Doe</name>    <email>[email protected]</email></person>

The corresponding Pb is as follows:

person {  name: "John Doe"  email: "[email protected]"}

Note: Here is only an intuitive representation of the Pb format. The actual PB is not stored in this way. In fact, in the link, the Pb data is in the binary format.

When the data is encoded in Pb binary format, the actual size is about 28 bytes, And the encoding time is 100 ~ 200 nanoseconds. If XML is used, the size is at least 69 bytes even if spaces are removed, and the encoding time is about 5000 ~ 10,000 nanoseconds.

Similarly, parsing this code makes Pb much easier than XML. If Pb is used:

person.getName();person.getEmail();

XML:

personNode.getElementsByTagName("name")personNode.getElementsByTagName("email")

In comparison, PB is more direct, and XML operations such as node traversal are not required.

However, there is no such thing as gold, no perfect person, and no petabytes. For text-based data (such as HTML) with many tags, XML is superior to Pb. XML is a sub-description that allows random and staggered reading of text nodes. XML is self-describing, but PB is not. PB must have a format definition file (. proto file)

5. A little history

Pb was developed by Google and was originally used to process the request/response protocol of the Indexing Server. Before Pb, Google uses manual grouping and grouping to process requests/corresponding protocols. This method requires support for many versions of the protocol, which leads to some very ugly code, such:

if (version == 3) {   ... } else if (version > 4) {   if (version == 5) {     ...   }   ... }

In addition, the protocol in this display format also makes the new Protocol version very complicated, because developers must confirm all servers before enabling the new protocol, both the request initiator and the actual request handler can understand the new protocol.

PB is designed to solve these problems:

  1. It is very easy to introduce new fields. The intermediate server that does not need to check the data can simply parse the data, and can transmit data without knowing all the fields of the data.
  2. The format can be more self-described and can be processed in multiple languages (C ++, Java, Python, etc)

Now, despite solving many problems, users still need to write their parsing and coding code.

With the development of the system, PB has gradually formed many new features and usage:

  1. Automatic Generation of serialization and deserialization code to avoid manual Parsing
  2. In addition to short-lived RPC requests, PB is also used as a convenient self-description format to store persistent data.
  3. Server RPC interfaces is declared as part of the protocol file. Stub classes are generated using PB compiler, and users can overwrite them using their own server interfaces.

Google protocol buffer (protobuf for short) is a standard for Google's internal hybrid language data. Currently, more than 48,162 types of message formats and more than 12,183. proto files are being used. They are used in RPC and continuous data storage systems.

Translation: https://developers.google.com/protocol-buffers/docs/overview

For the first time I attempted to translate, I hope I could not point out many of the shortcomings. Thank you ~

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.