Protocol Buffer Technology (language specification)

Source: Internet
Author: User
Tags comparison table types of tables

Nano version of Protobuf download address for http://koti.kapsi.fi/~jpa/nanopb/download/

The content body of this series of blogs mainly originates from the official document of protocol buffer, while the code example extracts the demo of a company's internal project that is currently being developed. The purpose of this is to not only maintain the good style and system of Google Docs, but also combine some practical and common use cases, which makes it more convenient for in-house training, as well as technical exchanges with the vast number of netizens. What needs to be explained is that the content of the blog is not a line by line translation, which contains some empirical summary, at the same time, for some not very commonly used features are not explained, interested developers can directly access Google's official documents.

first, why use protocol Buffer.
Before answering this question, let's first give a system scenario that is often encountered in actual development. For example, our client programs are developed using Java and may run from different platforms, such as Linux, Windows, or Android, and our server programs are usually based on Linux and are developed using C + +. There are several ways to design message formats for data communication between these two programs, such as:
1. Direct transfer of a byte-aligned structure in C/A + + language data, as long as the structure of the declaration is a fixed-length format, then this method is very convenient for C + + programs, only need to be received in accordance with the structure of the type of data forced conversion. In fact, the variable length structure is not very troublesome. When sending data, you only need to define a struct variable and set the value of each member variable, and then send the binary data to the remote char*. Conversely, this approach can be very cumbersome for Java developers, first you need to save the received data in Bytebuffer, and then read each field one by one according to the agreed byte order, and then assign the read value to the field variable in the other value object, so that other code logic in the program can be written. For this type of program, the benchmark is that both the client and the server must complete the message Message Builder before they can be expanded, and the design method will directly lead to slow development of the Java program. Even in the debug phase, you will often encounter small errors in a variety of field field stitching in Java programs.
2. Using the SOAP Protocol (WebService) as a format carrier for message messages, messages generated by this method are text-based and have a large amount of XML descriptive information, which will greatly increase the burden of network IO. And because of the complexity of XML parsing, this can also greatly reduce the performance of message parsing. In short, using this design method will make the overall performance of the system significantly reduced.
For the above two ways of the problem, Protocol buffer can be a good solution, not only that, Protocol buffer also has a very important advantage is to ensure the same message messages between the old and new versions of compatibility. As for the specific way we will be given in the following blog.

Second, define the first protocol buffer message.
Create a file with a. proto extension, such as: Mymessage.proto, and save the following to the file.
Message Logonreqmessage {
Required Int64 Acctid = 1;
Required String passwd = 2;
}
A key description of the above message definitions is given here.
1. Message is a keyword for the definition of the messages, equivalent to Struct/class in C + +, or class in Java.
2. Logonreqmessage is the name of the message, equivalent to the name of the struct or class.
3. The required prefix indicates that the field is a required field and must have been assigned before serializing and deserializing. At the same time, there are two other similar keywords in the protocol buffer, optional and repeated, and the message fields with these two qualifiers do not have the restrictions of required fields. Optional,repeated is mainly used to represent array fields compared to the. The specific usage will be listed in the following use cases.
4. Int64 and string represent the long integer and strings-type message fields, and there is a type table in protocol buffer, which protocol the data type in buffer against the type used in other programming languages (C++/java). The table also shows which types are more efficient in different data scenarios. The comparison table will be given at a later time.
5. Acctid and passwd represent message field names, equivalent to the name of a domain variable in Java, or a member variable name in C + +.
6. Label numbers1And2Represents the placement of different fields in the serialized binary data. In this example, the passwd field-encoded data must be located after Acctid. Note that the value cannot be duplicated in the same message. In addition, for protocol buffer, fields with a label value of 1 to 15 can be optimized when encoded, with both the label value and the type information holding only one byte, and the label range of 16 to 2047 will occupy two bytes, while the protocol The number of fields that can be supported by buffer is 2 of 29 times minus one. In view of this, when we design a message structure, we can consider the repeated type of field labels to be between 1 and 15 as much as possible, thus effectively saving the number of bytes encoded.

Third, define the second (contains enumerated fields) Protocol buffer message.
When you define a message for protocol buffer, you can add annotations in the same way as C++/java code.
Enum UserStatus {
OFFLINE = 0; Represents a user in the offline state
ONLINE = 1; Represents a user in the online state
}
Message UserInfo {
Required Int64 Acctid = 1;
Required String name = 2;
Required UserStatus status = 3;
}
The key instructions for the above message definition are given here (only those not described in the previous section).
1. An enum is a keyword of the enumeration type definition and is equivalent to an enum in C++/java.
2. UserStatus is the name of the enumeration.
3. Unlike enumerations in C++/java, the delimiter between enumerated values is a semicolon, not a comma.
4. Offline/online is an enumeration value.
5.0 and 1 represent the actual integer values corresponding to the enumeration values, and as C + +, you can specify any integer value for the enumeration value without always starting with the definition of 0. Such as:
Enum Operationcode {
Logon_req_code = 101;
Logout_req_code = 102;
Retrieve_buddies_req_code = 103;

Logon_resp_code = 1001;
Logout_resp_code = 1002;
Retrieve_buddies_resp_code = 1003;
}

Define a third (contains nested message fields) Protocol buffer message.
We can define multiple messages in the same. proto file so that we can easily implement the definition of a nested message. Such as:
Enum UserStatus {
OFFLINE = 0;
ONLINE = 1;
}
Message UserInfo {
Required Int64 Acctid = 1;
Required String name = 2;
Required UserStatus status = 3;
}
Message Logonrespmessage {
Required Loginresult Logonresult = 1;
Required UserInfo UserInfo = 2;
}
The key instructions for the above message definition are given here (only those not described in the previous two sections).
1. The definition of a logonrespmessage message contains another message type as its field, such as UserInfo UserInfo.
2. The UserInfo and Logonrespmessage in the previous example are defined in the same. proto file, so can we include the message defined in the other. proto file? Protocol Buffer provides another import of the keyword so that we can define many common messages in the same. proto file, while other message definition files can include the messages defined in the file by import, such as:
Import"Myproject/commonmessages.proto"

the basic rules of the qualifier (required/optional/repeated).
1. At least one required type of field must be left in each message.
2. Each message can contain 0 or more fields of type optional.
3. A field represented by repeated can contain 0 or more data. To be clear, this is different from an array in C++/java, because the array in the latter two must contain at least one element.
4. If you intend to add a new field to the original message protocol and ensure that the older version of the program is read or written correctly, the newly added field must be optional or repeated. The reason is very simple, the old version of the program can not read or write the new required qualifier field.

vi. types of tables.

. Proto Type Notes C + + Type Java Type
Double Double Double
Float Float Float
Int32 Uses variable-length encoding. Inefficient for encoding negative numbers–if your The field is likely to have the values, use negative sint32. Int32 Int
Int64 Uses variable-length encoding. Inefficient for encoding negative numbers–if your The field is likely to have the values, use negative sint64. Int64 Long
UInt32 Uses variable-length encoding. UInt32 Int
UInt64 Uses variable-length encoding. UInt64 Long
Sint32 Uses variable-length encoding. Signed int value. These are more efficiently encode negative numbers than regular. Int32 Int
Sint64 Uses variable-length encoding. Signed int value. These are more efficiently encode negative numbers than regular. Int64 Long
Fixed32 Always four bytes. More efficient than uint32 if values are often greater than 228. UInt32 Int
Fixed64 Always Eight bytes. More efficient than UInt64 if values are often greater than 256. UInt64 Long
Sfixed32 Always four bytes. Int32 Int
Sfixed64 Always Eight bytes. Int64 Long
bool bool Boolean
String A string must always contain UTF-8 encoded or 7-bit ASCII text. String String
bytes may contain any arbitrary sequence of bytes. String ByteString


Seven, Protocol buffer message upgrade principle.
In actual development, there is a scenario where the message format has to be upgraded because of changes in some requirements, but some applications that use the original message format are temporarily not upgraded immediately, which requires us to follow certain rules when upgrading the message format. Thus, the old and new programs can be guaranteed to run simultaneously based on the old and new message formats. The rules are as follows:
1. Do not modify the label number of the field that already exists.
2. Any newly added fields must be optional and repeated qualifiers, otherwise there is no guarantee of message compatibility between the new and old programs when they pass messages to each other.
3. In the original message, the existing required field cannot be removed, and the fields of the optional and repeated types can be removed, but the tag number they used before must be reserved and cannot be reused by the new field.
4. The types of Int32, UInt32, Int64, UInt64 and BOOL are compatible, SINT32 and Sint64 are compatible, and string and bytes are compatible, FIXED32 and SFIXED32, and FIXED64 and SFIXED64 are compatible, which means that if you want to modify the type of an existing field, you can only modify it to be compatible with its original type to ensure compatibility, otherwise you will break the compatibility of the old and new message formats.
5. The optional and repeated qualifiers are also mutually compatible.

Eight, Packages.
We can define the package name in the. proto file, such as:
PackageOurproject.lyphone;
When the package name is generated for the corresponding C + + file, it is replaced with the namespace name, which is namespace Ourproject {namespace Lyphone. In the generated Java code file, it becomes the package name.

Nine, Options.
Protocol buffer allows us to define some commonly used options in the. proto file, which instructs the Protocol buffer compiler to help us generate more matching target language codes. Protocol buffer built-in options are divided into the following three levels:
1. File level, this option will affect all messages and enumerations defined in the current file.
2. Message level, this option affects only a message and all the fields it contains.
3. Field level, this option only responds to the fields associated with it.
Some common protocol buffer options are given below.
1. Option Java_package = "Com.companyname.projectname";
Java_packageis a file-level option that lets you specify the package name that generates Java code as the option value, such as the Java code package named Com.companyname.projectname in the example above. At the same time, the generated Java files will automatically be stored in the Com/companyname/projectname subdirectory under the specified output directory. If this option is not specified, the Java package name is the name specified by the package keyword. This option has no effect on generating C + + code.
2. Option Java_outer_classname = "Lyphonemessage";
Java_outer_classnameis a file-level option, and the primary function is to display an external class name that specifies the generated Java code. If this option is not specified, the external class name for the Java code is the file name portion of the current file, and the file name is converted to the hump format, such as My_project.proto, so the default external class name for the file will be myproject. This option has no effect on generating C + + code.
Note: Mainly because Java requires that the same. java file contain only one Java external class or external interface, while C + + does not have this limitation. Therefore, the messages that are defined in the. proto file are the inner classes of the specified external class, so that the messages can be generated into the same Java file. In practical use, to avoid always entering the external class qualifier, you can introduce the external class statically into the current Java file, such as import static com.company.project.lyphonemessage.*.
3. Option optimize_for = Lite_runtime;
optimize_foris the file-level option, Protocol buffer defines three levels of optimization speed/code_size/lite_runtime. By default, it is speed.
SPEED: Indicates that the generated code is running efficiently, but the generated code takes up more space when compiled.
Code_size: Contrary to speed, code runs less efficiently, but the resulting code takes up less space and is typically used for platforms with limited resources, such as mobile.
Lite_runtime: The generated code is efficient, and the resulting code is compiled with very little space. This is at the expense of the reflective function provided by protocol buffer. So when we link protocol buffer in C + +, we need only link libprotobuf-lite, not libprotobuf. In Java, you only need to include Protobuf-java-2.4.1-lite.jar, not Protobuf-java-2.4.1.jar.
Note: For the lite_message option, the generated code will inherit from the Messagelite rather than the message.
4. [Pack= True]: For historical reasons, repeated fields such as Int32, Int64, and so on, are not well optimized for encoding, but in the newer version of protocol buffer, you can add the [pack=true] field options to To inform protocol that buffer is more efficient when encoding a message object of this type. Such as:
Repeated int32 samples = 4 [Packed=true].
Note: This option is available only for2.3.0Above the protocol Buffer.
5. [default= Default_value]: A field of type optional, if it is not set when serialized, or if the field does not exist at all in the old version of the message, then the message to deserialize the type is that the field of optional will be given the type-dependent default value, If BOOL is set to False,int32, it is set to 0. Protocol buffer also supports custom defaults, such as:
Optional Int32 result_per_page = 3 [default = 10].

10. Command line compilation tool.
Protoc--proto_path=import_path--cpp_out=dst_dir--java_out=dst_dir--python_out=dst_dir Path/to/file.proto
The parameter explanation of the above command is given here.
1. PROTOC provides command-line compilation tools for protocol buffer.
2.--proto_path equivalent to-i option, which is used to specify the directory in which the. Proto message definition file is to be compiled, and this option can be specified multiple at the same time.
3. The--cpp_out option represents the generation of C + + code,--java_out represents the generation of Java code, and--python_out represents the generation of Python code, followed by the directory where the generated code is stored.
4. Path/to/file.proto represents the message definition file to be compiled.
Note: for C + +, with the protocol buffer compiler tool, you can generate a pair of. h and. CC C + + code files for each. proto file. The resulting file can be loaded directly into the project where the application resides. such as: Mymessage.proto generated files for Mymessage.pb.h and MyMessage.pb.cc.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.