Document directory
- Principles
- Simple Test
- Key code for automatically creating a Message based on type name
- Thread Security
- Example
- Design Decisions
- Dispatching)
- Muduo-based Protobuf RPC?
Chen Shuo (giantchen_AT_gmail)
Blog.csdn.net/Solstice t.sina.com.cn/giantchen
The problem solved in this article is: how to automatically create a protobuf Message object and perform deserialization after receiving Protobuf data. "Automatic" means that when a protobuf Message type is added to the program, this part of code does not need to be modified, and you do not need to register the Message Type yourself. In fact, Google Protobuf itself has a strong reflection function, you can create a specific type of Message object based on the type name, we can directly use it.
This document assumes that the reader understands what Google Protocol Buffers is. This is not an introduction to protobuf.
This article uses the C ++ language as an example. Other languages have similar solutions.
The sample code for this article is in: https://github.com/chenshuo/recipes/tree/master/protobuf
Two Problems of using protobuf in Network Programming
Google Protocol Buffers (Protobuf) is a very good library that defines a compact and scalable binary message format, which is especially suitable for network data transmission. It provides binding for multiple languages, greatly facilitating the development of distributed programs, so that the system is no longer limited to writing in a certain language.
To use protobuf in network programming, you need to solve two problems:
- LengthThe data packaged by protobuf does not have its own length information or Terminator. Therefore, the application must correctly split the data when it occurs or receives the data;
- TypeThe protobuf package data is not includedType informationThe sender needs to transmit the type information to the receiver. The receiver creates a Protobuf Message object and performs deserialization.
The first solution is to add a fixed length header before each message. For example, in Muduo network programming example 2: Boost. the implementation of LengthHeaderCodec in Asio Chat Server, code see http://code.google.com/p/muduo/source/browse/trunk/examples/asio/chat/codec.h
The second problem is actually well solved. Protobuf has built-in support for this problem. But the strange thing is, from the simple search on the Internet, I found many shanzhai practices.
Shanzhai practices
The following are headers added before protobuf data. The header contains the int length andType information. There are two main types of information:
- Put int typeId in the header, and the receiver uses switch-case to select the corresponding message type and processing function;
- Put string typeName in the header, and the receiver uses the look-up table to select the corresponding message type and processing function.
Both methods have problems.
The first method requires the uniqueness of typeId, which corresponds to protobuf message type one by one. If protobuf message is not widely used, for example, the receiver and the sender are self-maintained programs, the uniqueness of typeId is not difficult to guarantee. You can use version management tools. If protobuf message is widely used, for example, all the companies are using it, and distributed programs developed by different departments may communicate with each other, a global organization within the company is required to allocate typeId, every time you add a new message type, you must register it, which is troublesome.
The second approach is better. The uniqueness of typeName is easier, because you can add the package name (that is, use the message's fully qualified type name). Each department divides the namespace in advance without conflict or repetition. However, it is troublesome to manually modify the initialization code of the look-up table when adding a message type.
In fact, you don't need to re-invent the wheel on your own. protobuf already comes with a solution.
Automatically creates a Message object based on type name reflection
Google Protobuf has a strong reflection function. You can create a Message object of a specific type based on the type name. However, it is strange that this usage is not explicitly mentioned in the official tutorial. I guess many people do not know this usage, so I think it is worth writing this blog.
The following figure shows the Protobuf class digoal drawn by Chen Shuo. Click to view the source image.
I guess most people usually care about and use the left half of the graph: MessageLite, Message, Generated Message Types (Person, AddressBook), and so on, but seldom notice the right half of the graph: Descriptor, descriptorPool, MessageFactory.
In, the key role is the Descriptor class, each specific Message Type corresponds to a Descriptor object. Although we didn't call its function directly, Descriptor played an important role in "Creating a Message object of a specific type based on type name" and played a bridge. The Red Arrow of describes how to create a specific Message object based on type name.
Principles
Protobuf Message class adopts prototype pattern, and Message class defines the New () virtual function to return a New instance of this object. The type is the same as the actual type of this object. That is to say, if you get the Message * pointer, you can create an object with the same Message Type as it without knowing its specific Type.
Each specific Message Type has a default instance, which can be obtained through ConcreteMessage: default_instance () or MessageFactory: GetPrototype (const Descriptor. So now the problem is changed to 1. How to Get MessageFactory; 2. How to Get Descriptor *.
Of course, ConcreteMessage: descriptor () returns the desired Descriptor *. However, if you do not know ConcreteMessage, how can you call its static member function? This seems to be a problem with chicken and eggs.
Our hero is DescriptorPool, which can find Descriptor * Based on the type name. You only need to find the appropriate DescriptorPool and then call DescriptorPool: FindMessageTypeByName (const string & type_name. Bright?
Before finally solving the problem, perform a simple test to see if I am correct.
Simple Test
The proto file for example: query. proto, see https://github.com/chenshuo/recipes/blob/master/protobuf/query.proto
package muduo;message Query { required int64 id = 1; required string questioner = 2; repeated string question = 3;}message Answer { required int64 id = 1; required string questioner = 2; required string answerer = 3; repeated string solution = 4;}message Empty { optional int32 id = 1;}
Query. questioner and Answer. answerer are process identifiers in distributed systems that I mentioned in the previous article.
The following code verifies ConcreteMessage: default_instance (), ConcreteMessage: descriptor (), MessageFactory: GetPrototype (), DescriptorPool: invariant ):
Https://github.com/chenshuo/recipes/blob/master/protobuf/descriptor_test.cc#L15
typedef muduo::Query T; std::string type_name = T::descriptor()->full_name(); cout << type_name << endl; const Descriptor* descriptor = DescriptorPool::generated_pool()->FindMessageTypeByName(type_name); assert(descriptor == T::descriptor()); cout << "FindMessageTypeByName() = " << descriptor << endl; cout << "T::descriptor() = " << T::descriptor() << endl; cout << endl; const Message* prototype = MessageFactory::generated_factory()->GetPrototype(descriptor); assert(prototype == &T::default_instance()); cout << "GetPrototype() = " << prototype << endl; cout << "T::default_instance() = " << &T::default_instance() << endl; cout << endl; T* new_obj = dynamic_cast<T*>(prototype->New()); assert(new_obj != NULL); assert(new_obj != prototype); assert(typeid(*new_obj) == typeid(T::default_instance())); cout << "prototype->New() = " << new_obj << endl; cout << endl; delete new_obj;
Key code for automatically creating a Message based on type name
All right, everything is ready, start to act:
- Use DescriptorPool: generated_pool () to find a DescriptorPool object, which containsAll protobuf Message types linked during program Compilation.
- Use DescriptorPool: FindMessageTypeByName () to find the Descriptor according to the type name.
- Use MessageFactory: generated_factory () to find the MessageFactory object. It can create all protobuf Message types linked during program compilation.
- Then, use MessageFactory: GetPrototype () to find the default instance of the specific Message Type.
- Finally, use prototype-> New () to create an object.
For sample code, see https://github.com/chenshuo/recipes/blob/master/protobuf/codec.h#L69
Message* createMessage(const std::string& typeName){ Message* message = NULL; const Descriptor* descriptor = DescriptorPool::generated_pool()->FindMessageTypeByName(typeName); if (descriptor) { const Message* prototype = MessageFactory::generated_factory()->GetPrototype(descriptor); if (prototype) { message = prototype->New(); } } return message;}
Call method: https://github.com/chenshuo/recipes/blob/master/protobuf/descriptor_test.cc#L49
Message* newQuery = createMessage("muduo.Query"); assert(newQuery != NULL); assert(typeid(*newQuery) == typeid(muduo::Query::default_instance())); cout << "createMessage(\"muduo.Query\") = " << newQuery << endl;
The ancient man has no fraud
Note that createMessage () returns a pointer to a dynamically created object. The caller has the responsibility to release it, otherwise the memory will leak. In muduo, I use shared_ptr <Message> to automatically manage the life cycle of the Message object.
Thread Security
According to Google's documents, the MessageFactory and DescriptorPool we use are thread-safe, and Message: New () is also thread-safe. And they are both const member functions.
If the key problem is solved, the rest of the work isDesign a protobuf transmission format that contains the length and Message Type.
Protobuf Transmission Format
Chen Shuo designed a simple format that contains protobuf data and its corresponding length and type information. There is a check sum at the end of the message. For example, the width of the square in the figure is 32-bit.
Description using C struct pseudocode:
struct ProtobufTransportFormat __attribute__ ((__packed__)) { int32_t len; int32_t nameLen; char typeName[nameLen]; char protobufData[len-nameLen-8]; int32_t checkSum; // adler32 of nameLen, typeName and protobufData };
Note that this format does not require 32-bit alignment. Our decoder will automatically process non-alignment messages.
Example
The result of packaging a muduo. Query object in this format is:
Design Decisions
Here are my considerations when designing this transfer format:
- Signed int. Only the signed 32-bit int is used for the length field in the message, but the unsigned int is not used for portability, because the Java language does not have the unsigned type. In addition, Protobuf is generally used to package data smaller than 1 M, and unsigned int is useless.
- Check sum. Although TCP is a reliable transmission protocol, although Ethernet has a CRC-32 verification, but network transmission must consider data corruption, for key network applications, check sum is essential. For protobuf, a compact binary format, check sum is required if there is no problem with the data.
- Adler32 Algorithm. I did not choose the common CRC-32, but the choice of adler32, because it has a small amount of computation, faster speed, intensity and CRC-32 almost. In addition, zlib and java.unit.zip both directly support this algorithm and do not need to be implemented by ourselves.
- Type name ends with '\ 0'. This is for the convenience of troubleshooting. For example, the type name can be easily seen with the naked eye through the tcpdump captured package, rather than using nameLen to separate numerical sections. In addition, nameLen is added to facilitate processing by the receiver, saving strlen () and changing the space time.
- No version. A major advantage of Protobuf Message is that optional fields is used to avoid Protocol version numbers (anyone who puts version numbers in protobuf Message does not understand protobuf's design ), this allows both parties to upgrade their programs to facilitate system evolution. If I add the version number to the transmission format I designed, I will be able to draw a picture. For details, see my "engineering development methods of Distributed Systems" Page 57th: Message format selection.
Sample Code
For the sake of simplicity, std: string is used as the packaging product, which is only an example.
Code for packaging encode: https://github.com/chenshuo/recipes/blob/master/protobuf/codec.h#L35
Code to unpack decode: https://github.com/chenshuo/recipes/blob/master/protobuf/codec.h#L99
Test code: https://github.com/chenshuo/recipes/blob/master/protobuf/codec_test.cc
If the above Code is compiled, but the "cannot open shared object file" error occurs during the runtime, you can use sudo ldconfig to solve the problem, provided that libprotobuf. so is located in/usr/local/lib, and/etc/ld. so. conf lists the directories.
$ Make all # If boost is installed, make whole
$./Codec_test
./Codec_test: error while loading shared libraries: libprotobuf. so.6: cannot open shared object file: No such file or directory
$ Sudo ldconfig
Integration with muduo
The muduo network library will integrate the support for the transmission format described in this article (expected version 0.1.9). I will also write a short article about Protobuf Message <=> muduo: net :: the Inter-Buffer conversion, the use of muduo: net: Buffer to package is simpler than the above std: string code, it is specially designed for the non-blocking Network Library buffer class.
In addition, we can write a codec to automatically complete the conversion, just like asio/char/codec. h. In this way, the Customer Code directly receives the Message object, and sends the Message object directly when sending the Message, without dealing with the Buffer object.
Dispatching)
At present, we have solved the automatic creation of messages. In network programming, another common task is to distribute different types of messages to different processing functions, which can also be completed by using Descriptor. I have implemented ProtobufDispatcherLite and ProtobufDispatcher in muduo. Users can register their own processing functions for different message types. It is expected to be released in version 0.1.9:
Basic edition, users need to do their own down casting: https://github.com/chenshuo/recipes/blob/master/protobuf/dispatcher_lite.cc
Advanced edition, with template tips, saving users typing: https://github.com/chenshuo/recipes/blob/master/protobuf/dispatcher.cc
Muduo-based Protobuf RPC?
Google Protobuf also supports RPC. Unfortunately, it only provides one framework without any Code related to open-source networks. muduo can fill this gap. I have not decided whether to enable muduo to support RPC in protobuf message format. muduo still has many things to do, and I have many blog posts to write, let's talk about RPC later.
Note: Remote Procedure Call (RPC) has two meanings: broad and narrow. In a narrow sense, onc rpc is the thing used to implement NFS. in a broad sense, RPC can be called as a function call or a line network communication, for example, Java RMI ,. net Remoting, Apache Thrift, libevent RPC, XML-RPC and so on.
(To be continued)