C + + Programmer protocol buffers Basic Guide

Source: Internet
Author: User
This tutorial provides a basic introduction to C + + programmers about protocol buffers. By creating a simple sample application, it will show us:

Defining message Formats in. proto files

Using the protocol buffer compiler

Read and write messages using the C + + protocol buffer API

This is not a comprehensive guide to using protocol buffers in C + +. For more detailed information, refer to Protocol Buffer Language Guide and Encoding Reference.

Why use Protocol buffers

The next example we'll use is a very simple "address book" application that reads the contact details from the file. Each person in the Address book has a name, ID, email address, and contact number.

How do I serialize and get structured data? Here are some solutions:

Sends/receives the native memory data structure in binary form. Typically, this is a fragile approach because the receive/read code must be compiled based on the exact same memory layout, size end, and so on. At the same time, when the file is incremented, the raw format data spreads rapidly along with the software associated with that format, which makes it difficult to extend the file format.

You can create a Ad-hoc method that encodes a data item into a string-for example, to encode 4 integers as 12:3:-23:67. Although it needs to write one-time encoding and decoding code and decoding requires a bit of runtime cost, this is a simple and flexible approach. This is best for coding very simple data.

The serialized data is XML. This approach is very appealing because XML is a format that is suitable for people to read and has libraries developed for many languages. This can be a good choice if you want to share data with other programs and projects. However, it is well known that XML is space-intensive, and when encoded and decoded, it can cause significant performance damage to the program. At the same time, using an XML DOM tree is considered more complex than manipulating a simple field of a class.

Protocol buffers is a flexible, efficient and automated solution to this problem. Using Protocol buffers, you need to write a. Proto description that describes the data structure you want to store. With the. proto file, the protocol buffer compiler can create a class for automating encoding and decoding of protocol buffer data in an efficient binary format. The resulting class provides the getters and setters of the fields that construct the protocol buffer, and acts as a unit to handle the details of read-write protocol buffer. Importantly, the protocol buffer format supports the extension of the format, and the code can still read data encoded in the old format.

Where can I find the sample code

The sample code is included in the source code package, which is located in the "Examples" folder. You can download the code here.

Define your protocol format

In order to create your own address book application, you need to start with. Proto. The definition in the. proto file is simple: Add a message to each data structure that you need to serialize, and then specify a name and type for each field in the message. Here is the. proto file Addressbook.proto that defines your message.

Package tutorial; Message person {   Required String name = 1;   Required Int32 id = 2;   Optional String email = 3;   Enum Phonetype {     MOBILE = 0;     HOME = 1;     Work = 2;   }   Message PhoneNumber {     Required String number = 1;     Optional Phonetype type = 2 [default = HOME];   }   Repeated PhoneNumber phone = 4; } message AddressBook {   repeated person person = 1;}

As you can see, its syntax is similar to C + + or Java. Let's start by looking at what each part of the file does.

The. Proto file starts with a package declaration, which avoids naming conflicts for different projects. In C + +, the class you generate will be placed in the same namespace as the package name.

Next, you need to define the message. The message is just a collection that contains a series of type fields. Most standard simple data types are available as field types, including bool, Int32, float, double, and string. You can also add more data structures to your messages by using other message types as field types-in the example above, the person message contains the PhoneNumber message, and the AddressBook message contains the person message. You can even define the type of message nested within other messages-as you see, the PhoneNumber type is defined inside the person. You can also define an enum type if you want the value of one of the fields to be a value in the list of predefined values--here you can specify that a phone number is one of MOBILE, HOME, or work.

The = 1, = 2 tag on each element determines the unique "tag" (tag) used for the binary encoding. The code for the label number 1-15 requires a byte less than a larger number, so as an optimization you can use these tags for frequently used or repeated elements, leaving 16 and higher tags for non-frequently used elements or optional elements. Each element of the repeated field needs to recode the label number, so the repeated field is suitable for use with this optimization method.

Each field must be labeled with the following modifier:

Required: The value of the field must be provided, otherwise the message is considered "uninitialized" (uninitialized). If Libprotobuf is compiled in debug mode, serializing an uninitialized message causes an assertion failure. Built in an optimized form, the check is skipped, and the message is written anyway. However, parsing an uninitialized message always fails (returning false through the Parse method). In addition, a required field behaves exactly like the optional field.

Optional: The field may or may not be set. If a optional field is not set, it will use the default value. For a simple type, you can specify your own default value, as in the example we have the type of the phone number, otherwise the system default value is used: The number type is 0, the string is an empty string, and the Boolean value is false. For nested messages, the default value is always the "default instance" or "prototype" of the message, and all of its fields are not set. Calling accessor to get a value that does not explicitly set the optional (or required) field always returns the default value of the field.

Repeated: The field can be repeated any number of times (including 0 times). The order of the repeated values is saved in protocol buffer. You can think of the repeated field as an array of dynamic size.

You can find complete instructions on writing a. proto file-including all possible field types-in Protocol Buffer Language Guide. Do not look for attributes similar to class inheritance in this case, because protocol buffers does not do this.

Required is permanent.

You should be especially careful when identifying a field as required. If in some cases you do not want to write or send a required field, change the field to optional you may encounter problems-older readers (LCTT: Reading, parsing older versions Protocol Buffer messages) Messages that do not contain the field are considered incomplete and may be rejected for parsing. In this case, you should consider writing a custom message validation function specifically for the application. Some of Google's engineers have come to the conclusion that using required does more harm than better; they prefer to use optional and repeated rather than required. Of course, this view is not universal.

Compile your Protocol buffers

Now that you have a. Proto, the next thing you need to do is generate a class that will be used to read and write addressbook messages, including person and PhoneNumber. To do this, you need to run the protocol buffer compiler PROTOC on your. Proto:

If you do not have a compiler installed, please download this package and follow the instructions in the README to install it.

Now run the compiler, specify the source directory where your application source code is located-if you do not provide any values, the current directory will be used, the target directory (where you want to generate the code, often the same as $SRC _dir), and your. Proto path. In this example:

protoc-i= $SRC _dir--cpp_out= $DST _dir $SRC _dir/addressbook.proto

Because you want a C + + class, you use the--cpp_out option--and also provide similar options for other supported languages.

In the destination folder that you specified, the following files are generated:

Addressbook.pb.h, declares the header file for the class you are generating.

Addressbook.pb.cc that contains the implementation of your class.

Protocol Buffer API

Let's look at some of the generated code to see what classes and functions the compiler created for you. If you look at Addressbook.pb.h, you can see that there is a class that specifies all the messages in the Addressbook.proto. Focusing on the person class, you can see that the compiler segmentsgenerating each word as a read-write function (accessors). For example, for name, ID, email, and phone fields, there are the following methods: (LCTT: The file name in the original text is wrong, the path is.) )

Name inline bool Has_name () const; inline void Clear_name (); Inline const::std::string& name () const; inline void Set_name (const::std::string& value); inline void Set_name (const char* value); Inline:: std::string* mutable_name (); ID inline bool has_id () const; inline void clear_id (); Inline int32_t ID () const; inline void set_id (int32_t value); Email inline bool Has_email () const; inline void Clear_email (); inline Const::std::string& Email () const; inline void Set_email (const::std::string& value); inline void Set_email (const char* value); Inline:: std::string* mutable_email (); Phone inline int phone_size () const; inline void Clear_phone (); Inline const:: Google::p rotobuf::repeatedptrfield<:: Tutorial::P erson_phonenumber >& phone () const; Inline:: Google::p rotobuf::repeatedptrfield<:: Tutorial::P erson_phonenumber >* mutable_phone (); Inline Const:: Tutorial::P erson_phonenumber& Phone (int index) const; Inline:: Tutorial::P erson_phonenumber* mutable_phone (int index); Inline:: Tutorial::P erson_phonenumber* add_phone ();

As you can see, getters's name is exactly the same as the lowercase name of the field, and the setter method begins with set_. At the same time, each single (singular) (Required or optional) field has a Has_ method that returns true if the field is set to a value. Finally, all fields have a Clear_ method that clears the field to the empty state.

The ID field of a numeric type has only the basic read-write function (accessors) set above, and the name and email fields have two additional methods because they are strings-one that is Mutable_ getter that can get a direct pointer to a string, and the other as an additional setter. Note that even though the email has not been set, you can call Mutable_email because the email is automatically initialized to an empty string. In this example, if you have a single (required or optional) message field, it will have a Mutable_ method and no set_ method.

The repeated field also has some special methods--if you look at the method of repeated's phone field, you can see:

Check the _size of the repeated field (that is, the number of phone numbers associated with the person)

Use subscript to get a specific phone number

Update a specific subscript phone number

Add a new phone number to the message, and then you can edit it. (Repeated scalar types have a add_ method for passing in new values)

To obtain information about the methods that the protocol compiler defines for all fields, you can view the C + + generated code reference.

Enumerations and nested classes

Corresponding to the enumeration of. Proto, the generated code contains a Phonetype enumeration. You can refer to this type by person::P Honetype, referencing its value through Person::mobile, Person::home, and Person::work. (Implementation details are a bit complicated, but you don't need to know them to be used directly)

The compiler also generates a person: a nested class:P honenumber. If you look at the code, you can see that the real type is person_phonenumber, but it uses typedef definitions inside the person so that you can think of Person_phonenumber as a nested class. The only example of an impact is if you want to declare the class in front of other files--in C + + You cannot pre-declare nested classes, but you can forward the declaration person_phonenumber.

Standard message methods

All message methods contain a number of other methods for checking and manipulating the entire message, including:

BOOL isinitialized () const; : Check whether all required fields have been set.

String debugstring () const; : Returns human-readable message representation, especially useful for debugging.

void CopyFrom (const person& from);: Rewrites the message with the given value.

void Clear ();: Clears all elements to a null state.

The above methods and the next SectionTo I/O method implement the message interface that is shared by all C + + protocol buffer classes. For more information, see the complete API documentation for message.

Parsing and serialization

Finally, all protocol buffer classes have methods that read and write to your selected type of messages, which use a specific protocol buffer binary format. These methods include:

BOOL Serializetostring (string* output) const;: Serializes the message and stores the message byte data in the given string. Note that byte data is in binary format, not text format; we only use the string class as the appropriate container.

BOOL Parsefromstring (const string& data);: Creates a parse message from the given character.

BOOL Serializetoostream (ostream* output) const;: Writes the message to the given C + + Ostream.

BOOL Parsefromistream (istream* Input): Parses a message from a given C + + istream.

These are just two choices for parsing and serialization. Again, you can view the complete list of Message API reference.

Protocol buffers and object-oriented design

The Protocol buffer class is usually just a purely data memory (like a struct in C + +); they are not a citizen in the object model. If you want to add richer behavior to the generated protocol buffer class, the best way is to encapsulate it in your application. If you have no control over the design of the. proto file, encapsulating protocol buffers is also a good idea (for example, you reuse a. proto file from another project). In that case, you can use encapsulation classes to design interfaces to better suit your application's specific environment: hiding some data and methods, exposing some of the functions that are easy to use, and so on. But you should never add behavior by inheriting the generated class. In doing so, it destroys its internal mechanisms and is not a good object-oriented practice.

Write a message

Now we try to use the protocol buffer class. The first thing your Address book program wants to do is write personal details to the Address Book file. To do this, you need to create, populate protocol buffer class instances, and write them to an output stream.

The program here can read the AddressBook from the file, add the new person to the addressbook based on user input, and write the new addressbook back to the file again. This part of the code that calls directly or references the protocol buffer class is marked with "//PB".

#include
 
  #include
  
   #include
   
    
 #include "addressbook.pb.h"//PB using namespace std; This function, fills in a, person, message based on user input.   void promptforaddress (Tutorial::P erson* person) {cout << ' Enter person ID number: ';   int id;   CIN >> ID;   person->set_id (ID);   PB Cin.ignore ("n");   cout << "Enter name:";    Getline (CIN, *person->mutable_name ());   PB cout << "Enter email address (blank for none):";   string email;   Getline (cin, email);   if (!email.empty ()) {//PB Person->set_email (email);     PB} while (true) {cout << ' Enter a phone number (or leave blank to finish): ";     string number;     Getline (cin, number);     if (Number.empty ()) {break;  } Tutorial::P erson::P honenumber* phone_number = Person->add_phone ();   PB Phone_number->set_number (number);     PB cout << "Is this a mobile, home, or work phone?";     String type;     Getline (cin, type);     if (type = = "mobile") {  Phone_number->set_type (Tutorial::P erson::mobile);   PB} else if (type = = "Home") {Phone_number->set_type (tutorial::P erson::home);   PB} else if (type = = "Work") {Phone_number->set_type (tutorial::P erson::work);  PB} else {cout << "Unknown phone type.     Using default. "<< Endl; }}}//Main function:reads the entire address book from a file,//Adds one person based on user input, then write S it back out to the same//file. int main (int argc, char* argv[]) {//Verify that version of the library that we linked against IS//compatible W   ith the version of the headers we compiled against.   Google_protobuf_verify_version;     PB if (argc! = 2) {cerr << "Usage:" << argv[0] << "Address_book_file" << Endl;   return-1;   } Tutorial::addressbook Address_book;     PB {//Read the existing address book.     FStream input (argv[1], ios::in | ios::binary); if (!Input) {cout << argv[1] << ": File not found.     Creating a new file. "<< Endl; } else if (!address_book.       Parsefromistream (&input)) {//PB Cerr << "Failed to the parse address Book." << Endl;     return-1;   }}//Add an address.  Promptforaddress (Address_book.add_person ());     PB {//Write the new address book back to disk.     FStream output (argv[1], ios::out | ios::trunc | ios::binary); if (!address_book.       Serializetoostream (&output)) {//PB Cerr << "Failed to write Address Book." << Endl;     return-1;   }}//Optional:delete All global objects allocated by LIBPROTOBUF.  Google::p rotobuf::shutdownprotobuflibrary (); PB return 0; }
   
  
 

Note the Google_protobuf_verify_version macro. It is a good practice-though not strictly necessary-to execute the macro before using the C + + Protocol Buffer Library. It ensures that you avoid accidentally linking to a library version that is incompatible with the version of the compiled header file. If the version is checked out, the program terminates. Note that each. pb.cc file automatically calls this macro when it is initialized.

Also note that the program finally calls Shutdownprotobuflibrary (). It is used to free all global objects requested by the Protocol Buffer Library. For most programs, this is not necessary, because although the program simply exits, the OS handles all the memory of the release program. However, if you use the Memory leak detection tool, the tool requires all objects to be freed, or you are writing a Protocol buffer library that may be loaded and unloaded multiple times by a process, then you may need to force Protocol Buffer to clear everything.

Reading messages

Of course, if you can't get any information from it, then this address book isn't much use! This example reads the file created by the example above and prints everything in the file.

#include
 
  #include
  
   #include
   
    
 #include "addressbook.pb.h"//PB using namespace std; Iterates though all people on the addressbook and prints info about them. void Listpeople (const tutorial::addressbook& address_book) {//PB for (int i = 0; i < Address_book.person_siz E ();    i++) {//PB const Tutorial::P erson& person = Address_book.person (i);   PB cout << "person ID:" << person.id () << Endl;    PB cout << "Name:" << person.name () << Endl; PB if (Person.has_email ()) {//PB cout << "e-mail address:" << person.email () << Endl   ; PB} for (int j = 0; J < Person.phone_size (); j + +) {//PB const Tutorial::P erson::P honenumber& ph  One_number = Person.phone (j); PB switch (Phone_number.type ()) {//PB case Tutorial::P erson::mobile://PB cout <<           "Mobile phone #:";         Break Case Tutorial::P erson::home://PB CouT << "Home phone #:";         Break           Case Tutorial::P erson::work://PB cout << ' work phone #: ';       Break    } cout << Phone_number.number () << Endl;  OB}}}//Main function:reads the entire address book from a file and prints all//the information inside. int main (int argc, char* argv[]) {//Verify that version of the library that we linked against IS//compatible   With the version of the headers we compiled against.   Google_protobuf_verify_version;     PB if (argc! = 2) {cerr << "Usage:" << argv[0] << "Address_book_file" << Endl;   return-1;   } Tutorial::addressbook Address_book;     PB {//Read the existing address book.     FStream input (argv[1], ios::in | ios::binary); if (!address_book.       Parsefromistream (&input)) {//PB Cerr << "Failed to the parse address Book." << Endl;     return-1; }} listpeople (Address_book);   Optional:delete all global objects allocated by LIBPROTOBUF.  Google::p rotobuf::shutdownprotobuflibrary (); PB return 0; }
   
  
 

Extended Protocol Buffer

Sooner or later after you release the code that uses protocol buffer, you will undoubtedly want to "improve" the definition of protocol buffer. If you want the new buffers backwards compatible, and the old buffers is forward compatible-almost certainly you crave this-there are some rules that you need to follow. In the new protocol buffer version:

You can never modify the label number of any existing field

You can never add or remove any required fields

You can delete the optional or repeated fields

You can add a new optional or repeated field, but you must use the new tag number (that is, the label number is never used in protocol buffer, not even the label number of the deleted field).

(There are some exceptions to the above rules, but they are seldom used.) )

If you can follow these rules, the old code can happily read new messages and simply ignore all new fields. For old code, the deleted optional field will simply give the default value, and the deleted repeated field will be empty. The new code can obviously read the old message. However, keep in mind that the new optional fields are not rendered in the old messages, so you need to explicitly use Has_ to check whether they are set or to use [default = value] to provide a reasonable default value after the. proto file has been tagged with a number. If a optional element does not specify a default value, it will use a type-specific default value: The default value is an empty string for a string, the default value is False for a Boolean value, and the default type is 0 for a numeric type. Note that if you add a new repeated field, the new code will not be able to tell if it was left blank (by the new code) or was never set (by the old Code) because the repeated field does not have a Has_ flag.

Optimization techniques

The C + + Protocol Buffer Library is extremely optimized. However, proper usage can improve performance more. Here are some tips that can help you squeeze the last bit of speed out of the library:

Reuse the message object as much as possible. Even if they are cleared, the message tries to keep all memory allocated for reuse. Therefore, if we are dealing with many messages of the same type or a series of similar structures, it is a good idea to reuse the same message object, thus reducing the burden of memory allocation. However, as time goes by, the object may swell up, especially when your message size (LCTT: Different message content, some message content, some less content) is different, or you occasionally create a much larger than usual message. You should monitor the size of the message object by calling the SpaceUsed method and delete it when it is too large.

For cases where a large number of small objects are allocated in multiple threads, your operating system memory allocator may not be optimized well enough. You can try using Google's tcmalloc.

Advanced usage

Protocol buffers is never used for simple data access and serialization. Please read the C + + API reference to see what else you can do with it.

A key feature provided by the Protocol message class is reflection (reflection). You do not need to write code for a particular message type, you can iterate through the fields of a message and manipulate their values. A useful way to use reflection is to convert protocol messages to and from other encodings, such as XML or JSON. A more advanced usage of reflection might be to find out the difference between two messages of the same type, or to develop a regular expression of a protocol message, and you can match a message content with a regular expression. As long as you use your imagination, it is possible to apply Protocol buffers to a broader range of issues that you might want to solve in the first place.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.