Introduction and examples of Google's open-source technology protobuf

Source: Internet
Author: User
Tags parse error

To introduce"ProtocolBuffers"(Protobuf. I was thinking about the next post in the "producer/consumer model" series: about the data transmission format between producers and consumers. Because protobuf is involved in it, you can simply open a separate post.

★What is protobuf?
In order to take care of students who have never heard of it, we should first take literacy as an example.
First, protobuf isOpen sourceProject (the official site is "here"), and it is an open-source project with a hard background. Most of the existing open-source projects (at least 80%) on the Internet are either a single person or a few casual people. Protobuf, however, is not a well-known Google company developed and tested inside Google. It can be seen that its authors are not generally comparable to ordinary people.
So what's the use of this awesome stuff? Simply put, what this stuff does is actuallyXMLThat is, to save the information of a certain data structure in a certain format. It is mainly used for data storage and transmission protocol formats. Some people may be confused: there is no need to use good XML. Why re-invent the wheel ?! Don't worry, I will naturally talk about it later.
Last year (around July), Google was suddenly compassionate and contributed this good thing to the open-source community. Now, you are lucky to be a ready-made guy like me! It seems that there are quite a few people who like to pick up the ready-made products. In addition to Google's appeal, protobuf's popularity has become quite popular in less than a year after open-source. In order to keep pace with the times, I have to post a separate post to hide it.

★What are the features of protobuf?
After the literacy program is complete, let's talk about it.TechnologyThis is a topic. This is because it has been released for a short period of time (under the age of years), so it is not long for me to contact you. I want to learn how to sell it here today. I am sorry for the number of column readers :-)

◇ Good performance/high efficiency
Now, I want to say that Google does not need to use good-end XML, so we have to make a new version. One fundamental reason is that XML performance is not good enough.
Time overhead: the overhead of XML formatting (serialization) is good, but the overhead of XML parsing (deserialization) is not flattering. I used to encounter some time-sensitive scenarios. I had to discard this because of the unbearable speed of XML parsing.
Let's look at the space overhead: people familiar with the XML syntax should know that the XML format introduces redundant text information for better readability. So the space overhead is not very good (but this disadvantage is not often encountered ).
Google is boasting about its massive data and massive processing capabilities. For clusters with hundreds of thousands or millions of machines, PB-level data volume is not enough, even if the performance is slightly improved by 100,000, it is quite impressive. Therefore, Google naturally cannot tolerate the obvious performance disadvantages of XML. In addition, Google has never been short of a cool man, so protobuf came into being.
Google is well-known for its poor performance. Therefore, I am very reassured that Google has developed protobuf, and I dare not say it is the best in performance, but it will certainly not be too bad.

CodeGeneration mechanism
In addition to good performance, the code generation mechanism is the main attraction. To illustrate the code generation mechanism, let's give an example.
For example, there is an e-commerce system (assuming it is implemented in C ++). Module A needs to send A large amount of order information to module B, and socket is used for communication.
Assume that the order includes the following attributes:
--------------------------------
Time: time (expressed as an integer)
Customer id: userid (expressed as an integer)
Transaction amount: price (expressed by floating point number)
Transaction description: desc (represented by a string)
--------------------------------
If you use protobuf for implementation, you must first write a proto File (Order. in this file, add a message structure named "Order" to describe the structured data in the communication protocol. The content of this file is roughly as follows:

 

--------------------------------

Message Order
{
Required int32 time = 1;
Required int32 userid = 2;
Required float price = 3;
Optional string desc = 4;
}

--------------------------------

 


Then, use the protobuf built-in compiler to compile the proto. Since the module in this example is C ++, you can use the command line parameters of the protobuf compiler (see "here") to generate the "order packaging class" in the C ++ language ". (Generally, a message structure generates a packaging class)
Then you use code similar to the following to serialize/parse the order packaging class:


--------------------------------

// Sender

Order order;
Order. set_time (XXXX );
Order. set_userid (123 );
Order. set_price (100366f );
Order. set_desc ("a test order ");

String sOrder;
Order. SerailzeToString (& sOrder );

// Then call a socket library to send the serialized string
//......

--------------------------------

// Receiver

String sOrder;
// First receives data through the network communication library and stores the data to a string sOrder
//......

Order order;
If (order. ParseFromString (sOrder) // Parse the string
{
Cout <"userid:" <order. userid () <endl
<"Desc:" <order. desc () <endl;
}
Else
{
Cerr <"parse error! "<Endl;
}

--------------------------------

 


With this code generation mechanism, developers no longer need to compile the protocol parsing code (this is a typical thankless job ).
In case of future demand changes, you need to add another "status" attribute to the Order. You only need to add a line of code in the Order. proto file. For sender (Module A), you only need to add A line of code that sets the status; for receiver (Module B), you only need to add A line of code that reads the status. Wow, it's so easy!
In addition, if the communication parties use different programming languages, this mechanism can effectively ensure that the modules on both sides are consistent in protocol processing.
By the way, run the question.
In a sense, the proto file can be viewed as a specification (or interface specification) describing the communication protocol ). In fact, this trick has been around for a long time. Anyone who has worked on Microsoft's COM programming or has been familiar with CORBA should be able to see the shadows of IDL (here is a detailed explanation. Their thoughts are the same.

◇ Supports backward compatibility and forward compatibility"
Let's talk about it with the example just now. To facilitate the description, I changed the order protocol with the "status" attribute into a "new version". The old version was called "old version ".
The so-called backward compatible means that after Module B is upgraded, it can correctly identify the protocol of the old version issued by module. Because the old version does not have the "status" attribute, you can consider setting the "status" attribute to an optional attribute during Protocol expansion, or set a default value for the "status" attribute (for how to set the default value, see "here ").
The so-called forward compatibility (forward compatible) means that after module A is upgraded, Module B can normally identify the new version of the protocol issued by module. At this time, the new "status" attribute will be ignored.
What is the difference between backward compatibility and forward compatibility? For example, when you maintain a large distributed system, you cannot upgrade all modules at the same time. To ensure that the entire system is not affected during the upgrade process, it is necessary to make sure that the communication protocol is backward compatible or forward compatible ".

◇ Supports multiple programming languages
Since I started my blogCommentsSeveral open-source projects (such as "Sqlite" and "cURL") support many programming languages. This protobuf is no exception. The source code officially released by Google contains C ++,JavaAnd Python (exactly the three most commonly used languages ). If you are using one of the three languages at ordinary times, it would be easy.
What if you want to use protobuf in other languages? Thanks to the appeal of Google, the open-source community is eager to respond to protobuf, and many other programming languages (such as ActionScript, C #, Lisp, Erlang, Perl,PHPAnd Ruby). Some languages have also developed multiple open-source projects. For details, see "here ".
However, I am obligated to remind all of you. If you want to use protobuf in these languages, you must evaluate the corresponding open source library carefully. Because these open-source libraries are not officially provided by Google, and they have not been released for a long time. Therefore, their quality and performance may be lacking.

★What are the defects of protobuf?
A few days ago, I emphasized in my post "halo effect" that "both advantages and disadvantages should be evaluated ". So I finally want to criticize the shortcomings of this stuff.
ApplicationNot wide enough
Since protobuf was just released, protobuf is a startup compared to XML. Therefore, XML is far inferior in terms of popularity and application breadth. For this reason, if the system you designed needs to provide a number of external interfaces for third-party system calls, I advise you not to consider protobuf format for the moment.
◇ Poor readability caused by binary format
To improve performance, protobuf uses binary format for encoding. This directly leads to poor readability (strictly speaking, it is not readable ). Although protobuf provides the TextFormat tool class (the document is "here"), it cannot be completely solved.
The danger of poor readability. Let's take another example. For example, if a problem occurs between the communication parties, it can easily lead to a wrapper (both of them do not admit that they have a problem, but both of them say they are wrong ). It's easy to handle.MethodIt means to directly capture packets and dump them into logs, which makes it easier to see which side of the error is. However, the binary format of protobuf makes it hard to understand the logs that are captured and directly dumped.
◇ Lack of self-description
Generally, XML is self-describing, while protobuf is not. I will give you a piece of protocol content in binary format. If it does not match the corresponding proto file, it would be like Tianshu.
Due to the "lack of self-description" and the addition of "binary format, resulting in poor readability ". Therefore, protobuf cannot replace XML in terms of configuration files.

★Why did I use protobuf?
Since I got started with protobuf some time ago, I replaced some of the data transmission protocols in my products with protobuf. Some people may ask why protobuf is a standalone solution, and there are also many things similar to protobuf? As the length of today's article has been quite long, I am selling a piece of attention and leave this topic to "producer/consumer mode [5]: How to choose the transmission protocol and format ?". I will compare various Protocol formats in this post and talk about my views.

Introduction and examples of Google's open-source technology protobuf

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.