Original article address: Protocol buffers: Google's Data Interchange Format
Address: http://hellobmw.com/archives/protocol-buffers-googles-data.html
Translated by Arctic ice Aberdeen.HyperlinkEnter the address of this article.
At Google, our task is to organize all the information around the world. It is no exaggeration to say that we use thousands of different data formats to describe network information between servers, index storage data and spatial datasets, and more. Most of the formats are structured, rather than flat (flat I will not translate -__-). This raises an important question: how do we encode these formats?
XML? No, this is not feasible. Although XML is great, it is no longer valid in the face of data of such scale. XML is an extremely expensive solution when all your machines and network connections run at the maximum load. Not to mention, writeCodeTo parse the DOM tree.
So we directly write the original bytes of the data structure in the memory to the network? No, this is also not feasible. When we launch a new version of server, it usually has to communicate with the old server. The new server must have the ability to read data generated by the old server, and vice versa, even if only individual fields are added or removed. This is even more important when data on the disk is related to each other. In addition, some of our code is written in Java or Python, so we need a portable solution.
How can we manually write parsing and serialization code for each data structure? Well, we used to do this. Needless to say, that is not a long-term plan. When tens of thousands of different structures in your code library need their respective serialization formats, you cannot simply write them all in your hands.
Therefore, we have developed protocol buffers. Protocol buffers allows you to use a special definition language to define simple data structures, and then compile them into classes written in the development language you choose to represent those data structures. These classes are deeply optimized to parse and serialize your information in an extremely compact format. The most exciting thing is that these classes are easy to use: each field has a simple get and set method. Once you have prepared, when everything is serialized into a byte array or an I/O Stream, or parsed from it, you only need to call one method.
Well, I know what you're thinking: "Is it just another IDL ?" Yes, you can call it that way. However, almost all IDL languages generally have a bad reputation: complexity is almost hopeless. One of the main goals of protocol buffers design is simplicity. A simple lists-and-records model solves most of the problems and is able to resist the desire to recover the declining return (Khan, original: resisting the desire to chase diminishing returns), we believe we have created some powerful but not bloated tool. In addition, yes, it is very fast-at least an order of magnitude faster than XML.
Now, we decided to open the Protocol buffers to the open sourceCommunityRelease. We have seen how effective protocol buffers is for a specific task, and we hope people can benefit from using it. Let's take a look at the document, download the source code, or tell us what you think.