Analysis of Google protocol buffers (1)

Source: Internet
Author: User
Tags format definition prototype definition

This article mainly tends to introduce how to use Google protocol Buffer technology to compress and parse your data files, for more detailed information, please refer to Google open Developer Web documentation, address: http://code.google.com/apis/protocolbuffers/docs/overview.html.

I. Brief Introduction

Of course, before continuing this article, readers still need to have some basic knowledge about Google protocol buffers. Protocol buffers is a technology used to serialize structured data. It supports multiple languages such as C ++, Java, and python. It can be used to persist data or serialize data transmitted over the network. Compared with other XML technologies, this technology is more space-saving (stored in binary stream), faster and more flexible.

Generally, writing a protocol buffers application involves the following three steps:

1. Define a message format file. It is best to use proto as the suffix name

2. Use the Protocol buffers compiler provided by Google to generateCodeFile, which is generally a. h and. CC file. It is mainly used to describe the Message format in a specific language.

3. Use the API provided by the Protocol buffers library to write applicationsProgram

 

Ii. Define the proto File

The proto file is the prototype definition file of the Message Protocol. In this file, we can use descriptive language to define the data format required by our program. First, we can use the example of a phone book provided in the Google online document to understand it, but a slight change is added.

Message person {
Required String Name =   1 ;
Required int32 ID =   2 ;
Optional String Email =   3 ;

EnumPhonetype {
Mobile= 0;
Home= 1;
Work= 2;
}

message phonenumber {
required string Number = 1 ;
optional phonetype type = 2 [ default = Home];
}

Repeated phonenumber phone= 4;

Required bytes unsure = 5;//Add byte array here
}

Message addressbook {
Repeated person= 1;
}

As you can see, the message format definition is very simple. For each field, there is a modifier (required/repeated/optional), field type (bool/string/Bytes/int32, etc) and the field tag.

The three modifiers can be clearly understood in terms of meaning,

1) for the required field, the initial value must be provided; otherwise, the field is not initialized. Compilation in the buffer library in debug mode may fail during serialization, And the parsing of this field will always fail during deserialization. Therefore, initialize fields whose modifier is required during serialization.

2) If optional is not initialized, a default value is assigned to this field. Of course, you can also specify the default value, as shown in the phonetype field type in the preceding proto definition.

3) for repeated fields, this field can be repeated. This addressbook example provided by Google has a good application scenario for this modifier, that is, each person may have multiple phone numbers. In the advanced language, we can implement this through arrays, while in the proto definition file, we can use repeated for modification to achieve the same purpose. Of course, 0 occurrences are included.

The field tag indicates the location where the field is stored in the binary stream. This is required, and the tag value of the same field during serialization and deserialization must match, otherwise, deserialization may cause unexpected problems.

3. Compile the proto file to generate data definition code for specific language data

After defining the proto file, you can use the file as the input file of the Protocol buffers compiler and compile the data definition code file in a specific language. This article is mainly for the c ++ language, so after using the compiler, the generated code files are. h and. CC. For C ++, Java and Python have their own compiler: http://code.google.com/p/protobuf/downloads/list

After downloading the corresponding compiler binary file, you can use the following command to complete the compilation process:

Protoc.exe-Proto_path=SRC--Cpp_out=DST SRC/Addressbook. proto 

-- Proto_path indicates the directory where the proto file is located, -- cpp_out indicates the directory where the generated code file is to be placed, and the last parameter indicates the path of the proto file. As shown in the above command, you can see that the addressbook under the src directory. after Proto is compiled and put in the DST directory, addressbook should be generated. pb. H and addressbook. pb. CC file (/files/royenhome/addressbook.rar ).

By viewing the header file, we can find that the following functions are generated for each field, taking number as an example:

// Required string number = 1;
Inline Bool Has_number () Const ;
Inline Void Clear_number ();
Inline Const : STD :: String & Number () Const ;
Inline Void Set_number ( Const : STD :: String & Value );
Inline Void Set_number ( Const   Char * Value );
Inline: STD :: String * Mutable_number ();

It can be seen that an has function (has_number), clear clearing function (clear_number), set function (set_number), get function (number and mutable_number) will be generated for each field ). The difference between the two functions in the get function is explained here. For the get function with the prototype const STD: string & number () const, the return value is a constant field, the value cannot be modified. However, in some cases, it is necessary to modify the field. Therefore, a mutable get function is provided to obtain the pointer of the field variable to change its value.

The function generated by the field modifier repeated is slightly different. For example, if the phone field is used, the compiler generates the following code for it:

// Repeated. Person. phonenumber phone = 4;
Inline Int Phone_size () Const ;
Inline Void Clear_phone ();
Inline Const : Google: protobuf: repeatedptrfield < : Person_phonenumber > & Phone () Const ;
Inline: Google: protobuf: repeatedptrfield < : Person_phonenumber > * Mutable_phone ();
Inline Const : Person_phonenumber & Phone ( Int Index) Const ;
Inline: person_phonenumber * Mutable_phone ( Int Index );
Inline: person_phonenumber * Add_phone ();

It can be seen that the Set function has become the Add function, which is very understandable. As mentioned above, the implementation of the repeated field in the advanced language may be an array or a dynamic array, so of course you can add a new field value by adding it. The get function also changes a lot, so you don't need to talk about it.

Well, this article mainly gives a brief introduction to understanding protocol buffer. Of course, you can refer to the official documentation for more details. NextArticleThis section describes how to use protocol buffers to complete data serialization and deserialization.

You are welcome to reprint the original article:Http://www.cnblogs.com/royenhomeThank you for your cooperation!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.