Google's Open source technology protobuf

Source: Internet
Author: User
Tags parse error socket wrapper
What ★protobuf is.
In order to take care of students who have never heard of, as a usual first to literacy.
First, Protobuf is a Open SourceProject (the official site is here), and it is a hard open source project in the background. Most (at least 80%) Open source projects are available online, either by someone doing it alone or by a couple of unauthorized. And Protobuf is not, it is the famous Google company developed, and in Google's internal tried-and-tested stuff. Thus, its author is not generally unauthorized people and so on comparable.
What's the use of this thing that sounds bull X? To put it simply, the thing that this stuff is doing is actually XMLAlmost, that is, the information of a certain data structure is saved in some form. Mainly used in data storage, transmission protocol format and other occasions. Some students may be psychologically whispered: Put a good XML no, why reinvent the wheel ah. Don't worry, I'll say it in the back.
By the end of last year (about July 08), Google suddenly relented and contributed the good to the open source community. This, like I like to pick up the ready-made guy can be blessed. Seemingly like to pick up off-the-shelf guy is quite a lot of drops, plus Google's appeal, after less than a year after the open source, the popularity of protobuf has been very prosperous. So I'm going to keep up with the times, just to open a single post to bluff.

What's special about ★protobuf.
After literacy, it's time to talk about technical topics. Since the release of this thing is short (under the Age of the year), so I contact time is not long. Today is the first to learn to sell, yours faithfully crossing many forgive:-)

◇ Good performance/high efficiency
Now, I would like to say why Google put the right XML do not need to reinvent the wheel. One fundamental reason is that XML performance is not good enough.
Time overhead: The cost of XML formatting (serialization) is fine, but the overhead of XML parsing (deserialization) is not flattering. I have often encountered some time performance sensitive occasions, because unbearable to endure the speed of XML parsing, abandoned as a.
Then look at the space cost: familiar with the XML syntax of the classmate should know that XML format in order to have a good readability, the introduction of some redundant text information. So the space cost is not too good (but this shortcoming, I do not often encounter).
Because Google is boasting about its massive amounts of data and massive processing power. For the hundreds of thousands of, millions of machines in the cluster, is always petabytes of data, even if the performance of a slight increase of 0.1% is quite impressive drops. So Google naturally cannot tolerate the obvious drawbacks of XML in performance. Plus, Google has never lacked the wheel-maker, so Protobuf was born.
Google's obsession with performance is well-known. So, I have to google out Protobuf is very drop rest assured, performance is not to say is the best, but certainly not too bad.

code generation mechanism
In addition to good performance, code generation mechanism is the main attraction of my place. To illustrate this code generation mechanism, let me give you an example.
For example, there is an e-commerce system (assuming a C + + implementation), where module a needs to send a large number of order information to Module B, the way to communicate using the socket.
Suppose the order includes the following attributes:
--------------------------------
Duration: Time (expressed in integers)
Customer Id:userid (expressed in integers)
Transaction amount: Price (indicated by floating-point number)
Description of the transaction: Desc (denoted by string)
--------------------------------
If you use PROTOBUF implementation, first write a proto file (perhaps called Order.proto), and add a message structure called "Order" in the file to describe the structured data in the communication protocol. The contents of the document are as follows:

--------------------------------

Message Order
{
Required Int32 time = 1;
Required Int32 userid = 2;
Required float price = 3;
Optional String desc = 4;
}

--------------------------------


Then, compile the proto using the PROTOBUF built-in compiler. Since the module in this example is C + +, you can generate the "Order wrapper class" in the C + + language by protobuf the compiler's command-line arguments (see "Here"). (In general, a message structure generates a wrapper class)
Then you use code like the following to serialize/parse the Order wrapper class:


--------------------------------

Sending party

Order order;
Order.set_time (XXXX);
Order.set_userid (123);
Order.set_price (100.0f);
Order.set_desc ("A Test order");

String SOrder;
Order. Serailzetostring (&sorder);

Then call the communication Library of some kind of socket to send the serialized string out.
// ......

--------------------------------

Receiving party

String SOrder;
First, the data is received through the network communication library and stored to a string sorder
// ......

Order order;
if (order. Parsefromstring (SOrder))//Parse the string
{
cout << "userid:" << order.userid () << Endl
<< "desc:" << order.desc () << Endl;
}
Else
{
Cerr << "Parse error!" << Endl;
}

--------------------------------


With this code generation mechanism, developers no longer have to Chi Chi to write code for protocol parsing (this is a typical thankless job).
In the event of a future change in demand, the requirement to add a "status" attribute to the order requires only one line of code to be added to the Order.proto file. For the sender (module a), just add a line of code to set the state, and for the receiver (module B) Just add a line of read state code. Wow, that was so easy.
In addition, if both sides of the communication are implemented using different programming languages, using this mechanism can effectively ensure that the modules on both sides are consistent with the protocol processing.
By the way, digress.
In a sense, the proto file can be viewed as a specification (or interface specification) describing a communication protocol. This kind of trick is actually old already, has engaged in Microsoft's COM programming or has contacted the CORBA schoolmate, should all can see IDL (detailed explanation see "here") shadow. Their thoughts are interlinked.

◇ support for "backwards compatible" and "forward compatible"
Just take the example. For the sake of narrative convenience, I have added the "state" attribute of the order agreement to become a "new version", previously called "old version".
The so-called "Backwards Compatibility" (backwardcompatible), that is, when module B is upgraded, it is able to correctly identify the old version of the protocol issued by module A. Since the old version does not have a "status" attribute, when expanding the protocol, consider setting the "State" property to be optional, or setting a default value for the "State" property (see "Here").
The so-called "forward compatible" (forward compatible), that is, when module A is upgraded, module B will normally recognize the new version of the protocol issued by module A. At this point, the newly added state property is ignored.
"Backwards compatible" and "forward compatible" have nothing to pinch. For example: When you maintain a very large distributed system, because you cannot upgrade all the modules at the same time, in order to ensure that during the upgrade process, the entire system can be as unaffected as possible, you need to ensure that the communication protocol "backward compatibility" or "Forward compatibility."

◇ Support multiple programming languages
I opened the blog since the review of several open source projects (such as "Sqlite", "CURL"), are supporting a lot of programming language drops, this time protobuf is no exception. Google's official release of the source code contains C + +,Java, Python three languages (just the three most commonly used, really cool). If you normally use one of these three languages, that's good.
If you want to use Protobuf in other languages, pinch. Thanks to Google's Youboy appeal, the open source community has responded enthusiastically to Protobuf, and has recently emerged in many other programming languages (such as ActionScript, C #, Lisp, Erlang, Perl,PHP, Ruby, etc.). Some languages also produce multiple open-source projects at the same time. See "Here" for specific details.
But it is my duty to remind all of you here. If you consider using Protobuf for these languages, be sure to evaluate the corresponding open source library carefully. Because these open source libraries are not officially provided by Google, and the time is not long. Therefore, their quality, performance and other aspects may be missing.

★protobuf is defective. The
has just been highlighted in a "halo effect" post a few days ago to "evaluate pros and cons at the same time." So I'm going to try to criticize the shortcomings of this thing.
Application not widely available
since protobuf just published not long, compared to XML, Protobuf is still a fledgling. Therefore, in terms of visibility, application breadth, etc. are far less than XML. For this reason, if you design a system that needs to provide a number of external interfaces to third-party system calls, I advise you not to consider PROTOBUF format for the time being.
◇ binary format results in poor readability
to improve performance, PROTOBUF is encoded in binary format. This leads directly to the problem of poor readability (strictly speaking, it is not readable). Although PROTOBUF provides textformat this tool class (the document is "here"), it does not solve the problem completely.
The harm of poor readability, let me give an example. For example, if the two sides of the communication problems, it is easy to lead to cross-talk (do not admit that they have problems, it is the other party's fault). I have a simple way of dealing with the wrangling. method is to directly grab the packet and dump it into log, it is easy to see which side of the error. But the PROTOBUF binary format, which causes you to grab the packet and dump it directly out of the log is difficult to read.
◇ lack of self-description
in general, XML is self-descriptive, while the PROTOBUF format is not. Give you a binary format of the protocol content, if you do not match the corresponding proto file, it is like a heavenly book general.
due to "lack of self-description", plus "binary format results in poor readability". So in terms of configuration files, Protobuf is definitely not the place to replace XML.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.