Protocol Buffers Official documentation (PROTO3 language Guide)

Last Update:2018-05-25 Source: Internet

Author: User

Tags comparison table

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This guide describes how to use the Protocol buffer language to organize your protocol buffer data, including the syntax rules for. proto files, and how to generate data access class code through the. Proto file.

Defining a message type (to define one)



syntax = "proto3";

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}

Only empty lines or comments before the syntax description (syntax)
Each field consists of field restrictions, field type, field name, and number four

Specifying field Types (Specify field type)

In the example above, the message defines three fields, two int32 types, and a field of type string.

assigning Tags (Give numbers)

Each field in the message has a unique numeric type number. 1 to 15 uses one byte encoding, 16 to 2047 uses 2 byte encoding, so the number 1 to 15 should be reserved for frequently used fields.
The smallest number that can be specified is 1, and the maximum is 2^{29}-1 or 536,870,911. However, values from 19000 to 19999 cannot be used, and these values are reserved for protocol buffer.

Specifying field Rules (Specify field Limits)

required: A field that must be assigned a value
optional: Fields that are optional
repeated: Repeatable field (variable length field)

Adding More Message Types (add more messages types)

A. proto file can define multiple message types:

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}

message SearchResponse {
 ...
}

Adding Comments (add comment)

.protoThe file also uses C + + style annotation syntax//

message SearchRequest {
  string query = 1;
  int32 page_number = 2;  // Which page number do we want?
  int32 result_per_page = 3;  // Number of results to return per page.
}

Reserved fields (reserved field)

If the field of a message is removed or commented out, but the user may reuse the field encoding, it can lead to problems such as data corruption, privacy vulnerabilities, and so on. One way to avoid this type of problem is to indicate that the deleted fields are reserved. The protocol buffer compiler emits an alarm if a user uses the number of these fields.



message Foo {
  reserved 2, 15, 9 to 11;
  reserved "foo", "bar";
}

What's Generated from Your. Proto? (Compile.protoFile

For C + +, each.protofile will be compiled.hwith one and one file corresponding to it.cc.

Scalar Value Types (type comparison table)

. Proto Type	Notes	C + + Type
Double	Double	Double
Float	Float	Float
Int32	Uses variable-length encoding. Inefficient for encoding negative numbers–if your field was likely to has negative values, use Sint32 instead.	Int32
Int64	Uses variable-length encoding. Inefficient for encoding negative numbers–if your field was likely to has negative values, use Sint64 instead.	Int64
UInt32	Uses variable-length encoding.	UInt32
UInt64	Uses variable-length encoding.	UInt64
Sint32	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.	Int32
Sint64	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.	Int64
Fixed32	Always four bytes. More efficient than uint32 if values is often greater than 2^28	UInt32
Fixed64	Always eight bytes. More efficient than UInt64 if values is often greater than 2^56	UInt64
Sfixed32	Always four bytes.	Int32
Sfixed64	Always eight bytes.	Int64
bool	bool	Boolean
String	A string must always contain UTF-8 encoded or 7-bit ASCII text.	String
bytes	may contain any arbitrary sequence of bytes.	String

Default values (defaults)

If you do not specify a default value, the system default is used, for the default value is an empty string, for the default value of FALSE, for the default value of 0, for the default value ofstringbool数值类型enumThe first element in the definition, the default value isrepeatednull.

Enumerations (enumeration)

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
  enum Corpus {
    UNIVERSAL = 0;
    WEB = 1;
    IMAGES = 2;
    LOCAL = 3;
    NEWS = 4;
    PRODUCTS = 5;
    VIDEO = 6;
  }
  Corpus corpus = 4;
}

By setting the optional parameterallow_aliasto True, you can use aliases in the enumeration structure (two value element values are the same)

enum EnumAllowingAlias {
  option allow_alias = true;
  UNKNOWN = 0;
  STARTED = 1;
  RUNNING = 1;
}
enum EnumNotAllowingAlias {
  UNKNOWN = 0;
  STARTED = 1;
  // RUNNING = 1;  // Uncommenting this line will cause a compile error inside Google and a warning message outside.
}

Because the enumeration values are varint encoded, the enumeration values are not recommended for negative numbers in order to improve efficiency. These enumeration values can be reused in other message definitions.

Using other message Types (using a different messaging type)

You can use the definition of one message as the field type of another message.

message SearchResponse {
  repeated Result results = 1;
}

message Result {
  string url = 1;
  string title = 2;
  repeated string snippets = 3;
}

Importing definitions (import definition)

Like a C + + header file, you can also import other. proto files


import "myproject/other_protos.proto";

If you want to move a.protofile, but do not want to modifyimportsome of the code in the project, you can leave an empty file in the original location of the file.proto, and then useimport publicthe new location after the import file is moved:



// new.proto
// All definitions are moved here



// old.proto
// This is the proto that all clients are importing.
import public "new.proto";
import "other.proto";



// client.proto
import "old.proto";
// You use definitions from old.proto and new.proto, but not other.proto

Nested Types (nested type)

The following nested types can be defined in protocol



message SearchResponse {
  message Result {
    string url = 1;
    string title = 2;
    repeated string snippets = 3;
  }
  repeated Result results = 1;
}

If you need to use a definition in another messageResult, you canParent.Type use it.



message SomeOtherMessage {
  SearchResponse.Result result = 1;
}

protocol supports deeper nesting and grouping nesting, but it is not recommended to use deep nesting for structural clarity purposes.

message Outer {                  // Level 0
  message MiddleAA {  // Level 1
    message Inner {   // Level 2
      int64 ival = 1;
      bool  booly = 2;
    }
  }
  message MiddleBB {  // Level 1
    message Inner {   // Level 2
      int32 ival = 1;
      bool  booly = 2;
    }
  }

Updating a Message type (update one data type)

In the actual development there will be a scenario in which the message format has to be upgraded due to changes in some requirements, but some applications that use the original message format cannot be upgraded immediately, which requires us to follow certain rules when upgrading message formats. This ensures that new and old programs are running simultaneously based on the new and old message formats. The rules are as follows:

Do not modify the label number of a field that already exists.
Any newly added fields must be optional and repeated qualifiers, or the new and old programs will not be guaranteed message compatibility when they pass messages to each other.
In the original message, you cannot remove the existing required field, the fields of optional and repeated types can be removed, but the tag numbers they used before must be preserved and cannot be reused by new fields.
Int32, UInt32, Int64, UInt64, and bool are compatible between types, Sint32 and Sint64 are compatible, string and bytes are compatible, FIXED32 and SFIXED32, and FIXED64 and SFIXED64 are compatible, which means that if you want to modify the type of the original field, you can only modify it to a type that is compatible with its original type for compatibility, otherwise the compatibility of the new and old message format will be broken.
The optional and repeated qualifiers are also mutually compatible.

Any (arbitrary message type)

AnyA type is a.prototype of message that you can use directly without defining it in a file, using a pre-importgoogle/protobuf/any.protofile.

import  "google/protobuf/any.proto";

message ErrorStatus  {
  string message =  1;
  repeated google.protobuf.Any details =  2;
}

C + + usesPackFrom()andUnpackTo()methods to package and packageAnytype messages.



// Storing an arbitrary message type in Any.
NetworkErrorDetails details =  ...;
ErrorStatus status;
status.add_details()->PackFrom(details);

// Reading an arbitrary message from Any.
ErrorStatus status =  ...;
for  (const  Any& detail : status.details())  {  if  (detail.Is<NetworkErrorDetails>())  {    NetworkErrorDetails network_error;
    detail.UnpackTo(&network_error);    ... processing network_error ...  }
}

Oneof (one of the field types)

A bit like a union in C + +, that is, multiple field types in a message only one field is used at the same time, usingcase()orWhichOneof()methods to detect which field is used.

Using oneof (with oneof)



message SampleMessage  {
  oneof test_oneof {
    string name =  4;
    SubMessage sub_message =  9;
  }
}

You can addrepeatedany type of field to theOneofdefinition except the outside

Oneof Features (oneof characteristics)

The oneof field is only valid for the last set of fields, that is, the subsequent set operation overrides the previous set operation


SampleMessage message;
message.set_name("name");
CHECK(message.has_name());
message.mutable_sub_message();   // Will clear name field.
CHECK(!message.has_name());

Oneof can't berepeated.
The reflection API can be used for oneof fields

If you use C + + to prevent memory leaks, the subsequent set operation overrides the previous set operation, causing the Field object previously set to be refactored, and note the pointer operation of the Field object


SampleMessage message;
SubMessage* sub_message = message.mutable_sub_message();
message.set_name("name");      // Will delete sub_message
sub_message->set_...            // Crashes her

If you use the C + +Swap()method to exchange two oneof messages, neither message will save the previous field


SampleMessage msg1;
msg1.set_name("name");
SampleMessage msg2;
msg2.mutable_sub_message();
msg1.swap(&msg2);
CHECK(msg1.has_sub_message());
CHECK(msg2.has_name());

Backwards-compatibility issues (backwards compatible)

When adding or removingoneoffields, be aware that ifoneofthe return value of a field is detected asNone/NOT_SET, which means thatoneofthere is no setting or setting a different version ofoneofthe field, there is no way to distinguish between the two cases. Because there is no way to confirm whether an unknown field is aoneofmember.

Tag Reuse Issues (number multiplexing issue)

Delete or add a field to oneof: Some information will be lost after the message is serialized or parsed, and some fields will be emptied
Delete a field and add it again: Clears the oneof field of the current setting after the message is serialized or parsed
Split or merge fields: Same as normal delete field Operation

Maps (table map)

Protocol buffers provides an introduction to the syntax to implement the map type:


map<key_type, value_type> map_field = N;

key_typeCan be abytesbase type other than a floating-point pointer or outside, whichvalue_typecan be any type


map<string,  Project> projects =  3;

Map fields cannot be duplicated (repeated)
The iteration order of the linear order and map values is undefined, so the elements of the map cannot be expected to be ordered
Maps can be sorted by key, and keys of numeric types are sorted by comparing values
When a linear parsing or merging occurs, the last key will be used if a duplicate key value is present. Resolves the map from text format and fails if duplicate key value is present.

Backwards compatibility (backwards compatible)

The following expressions in the map syntax are linearly equivalent, so even if protocol buffers does not implement the maps data structure, it does not affect the processing of the data:



message MapFieldEntry  {
  key_type key =  1;
  value_type value =  2;
}
repeated MapFieldEntry map_field = N;

Package

A C + +-like namespace to prevent name collisions

package foo.bar;
message Open { ... }

You can use the package specifier to define your message fields:



message Foo {
  ...
  foo.bar.Open open = 1;
  ...
}

Defining services

If you want to use the message type in the RPC system, you need to.protodefine the RPC service interface in the file and then use the compiler to generate the corresponding language stub.



service SearchService {
  rpc Search (SearchRequest) returns (SearchResponse);
}

JSON mapping

Proto3 supports encoding in JSON format. If no value or value is NULL for the encoded JSON data, protocol buffer will use the default value when parsing, which saves space when encoding json.

Proto3	JSON	JSON Example	Notes
Message	Object	{"FBar": V, "G": null, ...}	Generates JSON objects. Message field names is mapped to Lowercamelcase and become JSON object keys. is accepted and treated as thenulldefault value of the corresponding field type.
Enum	String	"Foo_bar"	The name of the enum value as specified in Proto is used.
map< k,v>	Object	{"K": V, ...}	All keys is converted to strings.
Repeated V	Array	[V, ...]	nullis accepted as the empty list [].
bool	True, False	True, False
String	String	"Hello world!"
bytes	Base64 string	"Ywjjmtizit8kkiyoksctpub+"
Int32, Fixed32, UInt32	Number	1,-10, 0	JSON value would be a decimal number. Either numbers or strings is accepted.
Int64, FIXED64, UInt64	String	"1", "10"	JSON value would be a decimal string. Either numbers or strings is accepted.
float, double	Number	1.1, -10.0, 0, "NaN", "Infinity"	JSON value would be a number or one of the special string values "NaN", "Infinity", and "-infinity". Either numbers or strings is accepted. Exponent notation is also accepted.
Any	Object	{"@type": "url", "F": V, ...}	If The any contains a value of the has a special JSON mapping, it'll be converted as follows:{"@type": xxx,<wbr style="box-sizing: inherit;"> "value": yyy}. Otherwise, the value would be converted into a JSON object, and the"@type"field would be inserted to indicate the actual dat A type.
Timestamp	String	"1972-01-01T10:00:20.021Z"	Uses RFC 3339, where generated output would always be z-normalized and Uses 0, 3, 6 or 9 fractional digits.
Duration	String	"1.000340012s", "1s"	Generated output always contains 0, 3, 6, or 9 fractional digits, depending on required precision. Accepted is any fractional digits (also none) as long as they fit into nano-seconds precision.
Struct	Object	{ ... }	Any JSON object. See Struct.proto.
Wrapper types	Various types	2, "2", "foo", True, "true", NULL, 0, ...	Wrappers use the same representation in JSON as the wrapped primitive type, except. isnullallowed and preserved dur ing data conversion and transfer.
Fieldmask	String	"F.foobar,h"	See Fieldmask.proto.
ListValue	Array	[foo, bar, ...]
Value	Value		Any JSON value
Nullvalue	Null		JSON NULL

Options

Protocol buffer allows us to define some common options in the. proto file, which instructs the Protocol buffer compiler to help us generate a more matching target language code. Protocol buffer The built-in options are divided into the following three levels:

At the file level, such an option will affect all messages and enumerations defined in the current file.
At the message level, such an option only affects a message and all the fields it contains.
At the field level, such an option only responds to fields associated with it.

Some of the commonly used protocol buffer options are given below.

optimize_for(File options): The values you can set haveSPEED,CODE_SIZEorLITE_RUNTIME, different options affect the generation of C + + code in the following ways (option optimize_for = CODE_SIZE;).
SPEED (default): The protocol buffer compiler will generate serialization, parsing, and other ways to efficiently manipulate message types. This is also the highest optimization option. Make sure the generated code is large.
CODE_SIZE: The protocol buffer compiler will generate the smallest class, which is determined to be slower than SPEED.
LITE_RUNTIME: The protocol buffer compiler will generate classes that rely only on the "lite" runtime library (libprotobuf-lite instead of libprotobuf). The lite runtime library is smaller than the entire library but removes features such as descriptors and reflection. This option is usually used. Optimization of the mobile phone platform.
cc_enable_arenas(File options): generated C + + code enable arena allocation memory management
deprecated(File options):

Resources

Protocol Buffer Official Document
Protocol Buffer Usage Profile
Protocol Buffer Technical Explanation (language specification)

Protocol Buffers Official documentation (PROTO3 language Guide)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More