Disclaimer: Most of the contents of this article are translated from official English documents, which may be interspersed with their own language to assist understanding, this article prohibits reprint.
First, what is protocol buffers
Protocol buffers is a flexible, efficient, and automated protocol for serializing structured data, which, compared to XML, Protocol buffers serialization with smaller, faster, and simpler code streams. You only need to define the data structure to be serialized once (using the. proto file definition), you can use the specially generated source code ( Note: using the build tool provided by PROTOBUF) Easily use different data streams to complete the reading and writing of these structural data, even if you use different languages (Protobuf's cross-language support features). You can even update the definition of your data structure (that is, update. proto file content) without destroying programs that rely on the "old" format to compile.
Second,the work flow of protocol buffers
defines a containing person information
message person {Required String name = 1< Span style= "color: #000000;" >; Required Int32 id = 2; Optional String Email = 3; enum Phonetype {MOBILE = 0; HOME = 1; Work = 2; Message PhoneNumber {Required String number = 1; Optional Phonetype Type = 2 [default = HOME ]; } repeated PhoneNumber phone = 4;}
as you can see, the format of the message is very simple-each message type has one or more fields with a unique number, each field has a field name and a field type, and the field type may be a numeric type (such as shaping or floating-point), Booleans (Boolean type), strings (string type), raw bytes, and even (as in the example above) can also be other protocol buffer message types, which allows you to organize your data structure hierarchically. You can specify each field individually optional fields (optional ), required fields (required), repeated fields (repeatable field). The next blog post will describe the. proto file in more detail.
Once you have defined your message, you can compile the. proto file to generate the data access class based on the language you are using (such as Java, C + +, Python, and so on) using the Compile tool provided by protocol buffer. These classes provide a simple accessor for each field (for example, name () and Set_name ()), as well as a way to serialize the entire structured data into raw byte data and to deserialize from the raw byte data into structured data (what is called a function in C + +). For example, if the language you are using is C + +, run the compiler to compile the example above will generate a class named person, in your application you can use this class to populate, serialize and deserialize people protocol buffer messages. You might then write down a code like this: Serialization :
1 person person ;2Person.set_name ("John Doe");3PERSON.SET_ID (1234);4Person.set_email ("[email protected]");5FStream Output ("myfile"Ios:: out|ios::binary);6Person. Serializetoostream (&output);
after that, you can read back your message ( "deserialization" ):
1 fstream input ( " myfile , Ios::in | ios::binary); 2 person person; 3 person. Parsefromistream (&input); 4 cout << " name: " << person.name () << Endl; 5 cout << " e-mail: << person.email () << Endl;
You can add new fields to your message without breaking forward compatibility; When parsing, the old binaries simply ignore the new fields, so if your communication protocol uses protocol buffers as a data interchange format, you can extend your protocols without worrying about disrupting existing code.
Third, why not use XML?
There are many advanced features when serializing structural data relative to Xml,protocol buffers:
1, more simple
2. Byte footprint is 3-10 times less than XML after serialization
3, serialization time efficiency is 20-100 times faster than XML
4, with less ambiguity
5. Automatically generate data access classes for easy application use
For example, if you want to describe a person data structure with name and email, in XML you need to do this:
< Person > < name > John Doe</name> <email>[email Protected]</email></person>
However, in protocol buffers's message (protocol buffers text format) You need to do this:
# Textual representation of a protocol buffer.# this was *not* the binary format used on the Wire.person { name: "John Doe " Email:" [email protected] "}
When this message is encoded as a binary format of protocol buffer (the text format described above is intended for easy reading, debugging, and editing), it may take up to 28 bytes in length and require only 100-200 nanoseconds of parsing time. In contrast, the XML version requires at least 69 bytes of space (this is after the whitespace is removed from the XML, after the newline), and the parsing time of approximately 5000-10000 nanoseconds is consumed.
In addition, manual operation of protocol buffer is more convenient, such as the following C + + code:
1 " " << person.name () << Endl; 2 " " << person.email () << Endl;
However, if you use XML, you will need to do this:
" name: << person.getelementsbytagname (" name ")->item (0 )->innertext () << Endl; 2 cout << " e-mail: << person.getelementsbytagname (" email ")->item (0 )->innertext () << Endl;
There are always two sides to things, and protocol buffers is not always a better choice than XML, for example, protocol buffers is not suitable for describing a text-based tagged document (such as HTML) because you cannot easily stagger the structure of text. In addition, XML is very readable and editable, while protocol buffers, at least in their native form, does not have this feature. XML is also extensible and self-descriptive. And a protocol buffer makes sense only if you have a message definition (defined in the. proto file).
Iv. how to start using protocol buffers?
First, you can download the installation package or source package here
Https://developers.google.com/protocol-buffers/docs/downloads#release-packages
This includes the full source code for the Java, Python, and C + + compilers, along with the I/O and test classes you need. In order to complete the compilation and installation, please refer to the Readme file.
Once you have finished compiling and installing, you can start using protocol buffers, and the subsequent posts will explain the specifics of the use of the C + + and Java languages.
v. Introduction ofProto3
Our latest version 3 Alpha release introduced a new language version of--protocol buffers version 3 (known as PROTO3), which introduced some new features in our existing language version (PROTO2). Proto3 simplifies the protocol buffer language, which makes it easier to use and support more programming languages: Our current Alpha release version allows you to generate Java, C + +, Pthyon, Javanano, Ruby, Objective-c and C # versions of protocol buffer code, but may sometimes have some limitations. In addition, you can use the latest Go Protoc plugin to generate the Go language version of Proto3 code, which can be obtained from Golang/protobuf Github repository.
We now only recommend you to use Proto3:
1. If you want to try to use protocol in our newly supported language buffers
2, if you want to try our latest open source RPC implementation GRPC (currently in Alpha release version), we recommend that you use Proto3 for all GRPC servers and clients to avoid compatibility issues.
Note that the two versions of the language APIs are not fully compatible, and in order to avoid inconvenience to the original user, we will continue to maintain the previous version ( Note:proto2).
Six, the last to say a little history
Protocol buffers was originally developed by Google as the Request/response protocol for processing index servers. Before protocol buffers was born, there was a protocol that needed to manually encode/decode requests, responses, which supported a digital version number, which led to a very ugly code as follows:
if (Version = = 3 ) { 2 ... 3 } (Version > 4 ) { 4 if (Version = = 5 5 ... 6 7 ... 8 }
Obviously, the formatted protocol also leads to a complex new version rollout problem, because the developer must ensure that the new protocol is understood between the initiator of all server requests and the actual request processor.
Protocol buffers is used to solve these problems:
1, you can easily insert new fields, the middle of the server can simply parse it without needing to know all the fields.
2, the format is more self-descriptive, can be processed in different languages (such as Java, C + +, Python, etc.).
3. Automatic generation of serialization and deserialization code to avoid manual parsing.
4, in addition to applications in RPC requests with a short life cycle, people began to use protocol buffers as a convenient self-describing format for storing data (such as in bigtable).
5, the RPC interface of the server begins to be declared as part of the protocol file, the stub class is generated by the protocol compiler, which can be performed by the user according to the actual implementation of the server interface rewrite .
Protobuf Chinese Course (first article)