A Google Protobuf network transmission solution for automatically reflecting Message Types

Source: Internet
Author: User
Document directory
  • Principles
  • Simple Test
  • Key code for automatically creating a Message based on type name
  • Thread Security
  • Example
  • Design Decisions
  • Dispatching)
  • Muduo-based Protobuf RPC?

Chen Shuo (giantchen_AT_gmail)

Blog.csdn.net/Solstice t.sina.com.cn/giantchen

The problem solved in this article is: how to automatically create a protobuf Message object and perform deserialization after receiving Protobuf data. "Automatic" means that when a protobuf Message type is added to the program, this part of code does not need to be modified, and you do not need to register the Message Type yourself. In fact, Google Protobuf itself has a strong reflection function, you can create a specific type of Message object based on the type name, we can directly use it.

This document assumes that the reader understands what Google Protocol Buffers is. This is not an introduction to protobuf.

This article uses the C ++ language as an example. Other languages have similar solutions.

The sample code for this article is in: https://github.com/chenshuo/recipes/tree/master/protobuf

Two Problems of using protobuf in Network Programming

Google Protocol Buffers (Protobuf) is a very good library that defines a compact and scalable binary message format, which is especially suitable for network data transmission. It provides binding for multiple languages, greatly facilitating the development of distributed programs, so that the system is no longer limited to writing in a certain language.

To use protobuf in network programming, you need to solve two problems:

  • LengthThe data packaged by protobuf does not have its own length information or Terminator. Therefore, the application must correctly split the data when it occurs or receives the data;
  • TypeThe protobuf package data is not includedType informationThe sender needs to transmit the type information to the receiver. The receiver creates a Protobuf Message object and performs deserialization.

The first solution is to add a fixed length header before each message. For example, in Muduo network programming example 2: Boost. the implementation of LengthHeaderCodec in Asio Chat Server, code see http://code.google.com/p/muduo/source/browse/trunk/examples/asio/chat/codec.h

The second problem is actually well solved. Protobuf has built-in support for this problem. But the strange thing is, from the simple search on the Internet, I found many shanzhai practices.

Shanzhai practices

The following are headers added before protobuf data. The header contains the int length andType information. There are two main types of information:

  • Put int typeId in the header, and the receiver uses switch-case to select the corresponding message type and processing function;
  • Put string typeName in the header, and the receiver uses the look-up table to select the corresponding message type and processing function.

Both methods have problems.

The first method requires the uniqueness of typeId, which corresponds to protobuf message type one by one. If protobuf message is not widely used, for example, the receiver and the sender are self-maintained programs, the uniqueness of typeId is not difficult to guarantee. You can use version management tools. If protobuf message is widely used, for example, all the companies are using it, and distributed programs developed by different departments may communicate with each other, a global organization within the company is required to allocate typeId, every time you add a new message type, you must register it, which is troublesome.

The second approach is better. The uniqueness of typeName is easier, because you can add the package name (that is, use the message's fully qualified type name). Each department divides the namespace in advance without conflict or repetition. However, it is troublesome to manually modify the initialization code of the look-up table when adding a message type.

In fact, you don't need to re-invent the wheel on your own. protobuf already comes with a solution.

Automatically creates a Message object based on type name reflection

Google Protobuf has a strong reflection function. You can create a Message object of a specific type based on the type name. However, it is strange that this usage is not explicitly mentioned in the official tutorial. I guess many people do not know this usage, so I think it is worth writing this blog.

The following figure shows the Protobuf class digoal drawn by Chen Shuo. Click to view the source image.

I guess most people usually care about and use the left half of the graph: MessageLite, Message, Generated Message Types (Person, AddressBook), and so on, but seldom notice the right half of the graph: Descriptor, descriptorPool, MessageFactory.

In, the key role is the Descriptor class, each specific Message Type corresponds to a Descriptor object. Although we didn't call its function directly, Descriptor played an important role in "Creating a Message object of a specific type based on type name" and played a bridge. The Red Arrow of describes how to create a specific Message object based on type name.

Principles

Protobuf Message class adopts prototype pattern, and Message class defines the New () virtual function to return a New instance of this object. The type is the same as the actual type of this object. That is to say, if you get the Message * pointer, you can create an object with the same Message Type as it without knowing its specific Type.

Each specific Message Type has a default instance, which can be obtained through ConcreteMessage: default_instance () or MessageFactory: GetPrototype (const Descriptor. So now the problem is changed to 1. How to Get MessageFactory; 2. How to Get Descriptor *.

Of course, ConcreteMessage: descriptor () returns the desired Descriptor *. However, if you do not know ConcreteMessage, how can you call its static member function? This seems to be a problem with chicken and eggs.

Our hero is DescriptorPool, which can find Descriptor * Based on the type name. You only need to find the appropriate DescriptorPool and then call DescriptorPool: FindMessageTypeByName (const string & type_name. Bright?

Before finally solving the problem, perform a simple test to see if I am correct.

Simple Test

The proto file for example: query. proto, see https://github.com/chenshuo/recipes/blob/master/protobuf/query.proto

package muduo;message Query {  required int64 id = 1;  required string questioner = 2;  repeated string question = 3;}message Answer {  required int64 id = 1;  required string questioner = 2;  required string answerer = 3;  repeated string solution = 4;}message Empty {  optional int32 id = 1;}
Query. questioner and Answer. answerer are process identifiers in distributed systems that I mentioned in the previous article.

The following code verifies ConcreteMessage: default_instance (), ConcreteMessage: descriptor (), MessageFactory: GetPrototype (), DescriptorPool: invariant ):

Https://github.com/chenshuo/recipes/blob/master/protobuf/descriptor_test.cc#L15

  typedef muduo::Query T;  std::string type_name = T::descriptor()->full_name();  cout << type_name << endl;  const Descriptor* descriptor = DescriptorPool::generated_pool()->FindMessageTypeByName(type_name);  assert(descriptor == T::descriptor());  cout << "FindMessageTypeByName() = " << descriptor << endl;  cout << "T::descriptor()         = " << T::descriptor() << endl;  cout << endl;  const Message* prototype = MessageFactory::generated_factory()->GetPrototype(descriptor);  assert(prototype == &T::default_instance());  cout << "GetPrototype()        = " << prototype << endl;  cout << "T::default_instance() = " << &T::default_instance() << endl;  cout << endl;  T* new_obj = dynamic_cast<T*>(prototype->New());  assert(new_obj != NULL);  assert(new_obj != prototype);  assert(typeid(*new_obj) == typeid(T::default_instance()));  cout << "prototype->New() = " << new_obj << endl;  cout << endl;  delete new_obj;
Key code for automatically creating a Message based on type name

All right, everything is ready, start to act:

  1. Use DescriptorPool: generated_pool () to find a DescriptorPool object, which containsAll protobuf Message types linked during program Compilation.
  2. Use DescriptorPool: FindMessageTypeByName () to find the Descriptor according to the type name.
  3. Use MessageFactory: generated_factory () to find the MessageFactory object. It can create all protobuf Message types linked during program compilation.
  4. Then, use MessageFactory: GetPrototype () to find the default instance of the specific Message Type.
  5. Finally, use prototype-> New () to create an object.

For sample code, see https://github.com/chenshuo/recipes/blob/master/protobuf/codec.h#L69

Message* createMessage(const std::string& typeName){  Message* message = NULL;  const Descriptor* descriptor = DescriptorPool::generated_pool()->FindMessageTypeByName(typeName);  if (descriptor)  {    const Message* prototype = MessageFactory::generated_factory()->GetPrototype(descriptor);    if (prototype)    {      message = prototype->New();    }  }  return message;}

Call method: https://github.com/chenshuo/recipes/blob/master/protobuf/descriptor_test.cc#L49

  Message* newQuery = createMessage("muduo.Query");  assert(newQuery != NULL);  assert(typeid(*newQuery) == typeid(muduo::Query::default_instance()));  cout << "createMessage(\"muduo.Query\") = " << newQuery << endl;

The ancient man has no fraud

Note that createMessage () returns a pointer to a dynamically created object. The caller has the responsibility to release it, otherwise the memory will leak. In muduo, I use shared_ptr <Message> to automatically manage the life cycle of the Message object.

Thread Security

According to Google's documents, the MessageFactory and DescriptorPool we use are thread-safe, and Message: New () is also thread-safe. And they are both const member functions.

If the key problem is solved, the rest of the work isDesign a protobuf transmission format that contains the length and Message Type.

Protobuf Transmission Format

Chen Shuo designed a simple format that contains protobuf data and its corresponding length and type information. There is a check sum at the end of the message. For example, the width of the square in the figure is 32-bit.

Description using C struct pseudocode:

 struct ProtobufTransportFormat __attribute__ ((__packed__)) {   int32_t  len;   int32_t  nameLen;   char     typeName[nameLen];   char     protobufData[len-nameLen-8];   int32_t  checkSum; // adler32 of nameLen, typeName and protobufData };
Note that this format does not require 32-bit alignment. Our decoder will automatically process non-alignment messages.
Example

The result of packaging a muduo. Query object in this format is:

Design Decisions

Here are my considerations when designing this transfer format:

  • Signed int. Only the signed 32-bit int is used for the length field in the message, but the unsigned int is not used for portability, because the Java language does not have the unsigned type. In addition, Protobuf is generally used to package data smaller than 1 M, and unsigned int is useless.
  • Check sum. Although TCP is a reliable transmission protocol, although Ethernet has a CRC-32 verification, but network transmission must consider data corruption, for key network applications, check sum is essential. For protobuf, a compact binary format, check sum is required if there is no problem with the data.
  • Adler32 Algorithm. I did not choose the common CRC-32, but the choice of adler32, because it has a small amount of computation, faster speed, intensity and CRC-32 almost. In addition, zlib and java.unit.zip both directly support this algorithm and do not need to be implemented by ourselves.
  • Type name ends with '\ 0'. This is for the convenience of troubleshooting. For example, the type name can be easily seen with the naked eye through the tcpdump captured package, rather than using nameLen to separate numerical sections. In addition, nameLen is added to facilitate processing by the receiver, saving strlen () and changing the space time.
  • No version. A major advantage of Protobuf Message is that optional fields is used to avoid Protocol version numbers (anyone who puts version numbers in protobuf Message does not understand protobuf's design ), this allows both parties to upgrade their programs to facilitate system evolution. If I add the version number to the transmission format I designed, I will be able to draw a picture. For details, see my "engineering development methods of Distributed Systems" Page 57th: Message format selection.
Sample Code

For the sake of simplicity, std: string is used as the packaging product, which is only an example.

Code for packaging encode: https://github.com/chenshuo/recipes/blob/master/protobuf/codec.h#L35

Code to unpack decode: https://github.com/chenshuo/recipes/blob/master/protobuf/codec.h#L99

Test code: https://github.com/chenshuo/recipes/blob/master/protobuf/codec_test.cc

If the above Code is compiled, but the "cannot open shared object file" error occurs during the runtime, you can use sudo ldconfig to solve the problem, provided that libprotobuf. so is located in/usr/local/lib, and/etc/ld. so. conf lists the directories.

$ Make all # If boost is installed, make whole

$./Codec_test
./Codec_test: error while loading shared libraries: libprotobuf. so.6: cannot open shared object file: No such file or directory

$ Sudo ldconfig

Integration with muduo

The muduo network library will integrate the support for the transmission format described in this article (expected version 0.1.9). I will also write a short article about Protobuf Message <=> muduo: net :: the Inter-Buffer conversion, the use of muduo: net: Buffer to package is simpler than the above std: string code, it is specially designed for the non-blocking Network Library buffer class.

In addition, we can write a codec to automatically complete the conversion, just like asio/char/codec. h. In this way, the Customer Code directly receives the Message object, and sends the Message object directly when sending the Message, without dealing with the Buffer object.

Dispatching)

At present, we have solved the automatic creation of messages. In network programming, another common task is to distribute different types of messages to different processing functions, which can also be completed by using Descriptor. I have implemented ProtobufDispatcherLite and ProtobufDispatcher in muduo. Users can register their own processing functions for different message types. It is expected to be released in version 0.1.9:

Basic edition, users need to do their own down casting: https://github.com/chenshuo/recipes/blob/master/protobuf/dispatcher_lite.cc

Advanced edition, with template tips, saving users typing: https://github.com/chenshuo/recipes/blob/master/protobuf/dispatcher.cc

Muduo-based Protobuf RPC?

Google Protobuf also supports RPC. Unfortunately, it only provides one framework without any Code related to open-source networks. muduo can fill this gap. I have not decided whether to enable muduo to support RPC in protobuf message format. muduo still has many things to do, and I have many blog posts to write, let's talk about RPC later.

Note: Remote Procedure Call (RPC) has two meanings: broad and narrow. In a narrow sense, onc rpc is the thing used to implement NFS. in a broad sense, RPC can be called as a function call or a line network communication, for example, Java RMI ,. net Remoting, Apache Thrift, libevent RPC, XML-RPC and so on.

(To be continued)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.