Experience with two of the most frequently used C + + serialization scenarios (PROTOBUF and boost serialization)

Source: Internet
Author: User
Tags object serialization

Guide


1. What is serialization?

2. Why serialize? Where is the advantage?

3. Four ways to serialize C + + objects

4. The two most frequently used serialization scenarios use the experience


Body


1. What is serialization?

Program apes in writing applications often need to store some of the program's data in memory, and then write it to a file or transfer it to a network on a computer for communication. The process of converting program data into a format that can be stored and transmitted is called "serialization" (serialization), and its inverse process can be referred to as "deserialization" (deserialization).

In simple terms, serialization is the process of translating the state of an object instance into a format that can be persisted or transmitted. Relative to serialization is deserialization, which reconstructs an object from a stream. These two processes combine to make it easy to store and transfer data. For example, the ability to serialize an object and then use HTTP to transfer the object between the client and server over the Internet.

Summarize

Serialization: Turns an object into a byte stream.

Deserialization: Restores the original object from the byte stream.


2. Why serialize? Where is the advantage?

In simple terms, object serialization is often used for two purposes:

(1) Storing objects on the hard disk for later deserialization

(2) byte sequence for transmitting objects on the network


What are the advantages of object serialization? The convenience of network transmission, flexibility is not said, here is what we can often happen: you have a data structure, the data stored in it is very many other data generated through a very complex algorithm, because the amount of data is very large, the algorithm is complex, So it may take a long time to generate the data for that data structure (perhaps a few hours or even days), and then to generate that data structure and use it for other calculations, then you have to run a program every time in the debugging phase, and it takes so long to generate the data structure, no doubt it will cost a lot. Assuming that you are sure that the algorithm that generates the data structure will not change or change, then you can generate data from the serialization technology to the disk, the next time you execute the program only need to read the object data from disk, it takes time to study a file time, imagine how fast, Our development time is saved.


3. Four ways to serialize C + + objects

There are generally four ways to serialize C + + objects, as described below:


3.1 Google Protocol buffers (PROTOBUF)

Google Protocol buffers (GPB) is a data encoding method used internally by Google, designed to replace XML for data exchange. Can be used for serialization and deserialization of data. The main features are:

    • Efficient
    • Language neutral (Cpp, Java, Python)
    • Can be extended

Official documents


3.2 boost.serialization

Boost.serialization can create or reconstruct equivalent structures in a program and save them as binary data, text data, XML, or other files that are defined by the actual user. The library has the following attractive features:

    • The code is portable (the implementation relies on ANSI C + + only).
    • Depth pointer save and restore.
    • Ability to serialize STL containers and other frequently used template libraries.
    • Data is portable.
    • Non-invasive.

3.3 MFC Serialization

The serialization methods in MFC can be used under the Windows platform. MFC provides built-in support for serialization in the CObject class. Therefore, all classes derived from CObject can take advantage of the CObject serialization protocol.

The introduction in MSDN


3.4. Net Framework

. NET's execution-time environment is used to support the mechanism of user-defined type fluidization. In this procedure, you convert the public and private fields of the object and the name of the class (the assembly containing the class) to a byte stream, and then write the byte stream to the data stream. When the object is subsequently deserialized, the exact same copy of the original object is created.


3.5 Simple Summary

Each of these serialization schemes has its own advantages and disadvantages, each with its own applicable scenarios. the MFC and. NET Framework approaches are very narrow in scope, apply only to Windows, and the. NET Framework approach requires the. NET execution Environment . Reference 1 Compare the first three serialization schemes from the serialization time, deserialization time, and the size of the resulting data file, and draw conclusions such as the following (for reference only):

    • Google Protocol buffers is more efficient, but data objects must be pre-defined and compiled using PROTOC, which is suitable for use in an internal context that requires efficiency and is agreeable to its own defined type.
    • Boost.serialization is flexible and simple to use, and supports standard C + + containers.
    • In comparison, MFC is less efficient, but the combination of MSVs platform is the most convenient to use.

In order to consider the portability, applicability and efficiency of the platform, we recommend that you use Google's Protobuf and boost serialization scheme , and the following describes my experience and considerations for using both of these options.


4. The two most frequently used serialization scenarios use the experience

The detailed use of these two scenarios and demo sample is not good to write, due to the excellent reference material is very much, please see the relevant reference information given in the following, here just give me some of the experience, convenient for you in the selection of the serialization program when there is a correct participation, to avoid the choice of errors, waste time.


4.1 Google Protocol Buffers

Protobuf is relatively efficient, protobuf is highly efficient, both in terms of installation efficiency and efficiency, and protobuf is used not only for C + + serialization, but also for Java and Python serialization, with a wide range of applications. However, there are two issues to be aware of during use:


(1) PROTOBUF supported data types are not very rich

Protobuf is lightweight and therefore does not support too many data types, the following is a list of the basic types supported by PROTOBUF, which generally satisfies the requirements, just before choosing a solution, or whether it will be supported before it is wasted. The same table is also worth collecting as we take a reference when defining the type.

. Proto Type

C++

Notes

Double

Double

Float

Float

Int32

Int32

Using variable-length encoding, negative numbers are not efficient and should be used Sint32

Int64

Int64

Ditto

UInt32

UInt32

Use variable-length encoding

UInt64

UInt64

Ditto

Sint32

Int32

Use variable-length encoding, signed integer value, and encoding is more efficient than the usual int32

Sint64

Sint64

Ditto

Fixed32

UInt32

Always 4 bytes, assuming the value is always larger than 2^28 , this type will be more efficient than UInt32

Fixed64

UInt64

Always 8 bytes, assuming the value is always larger than 2^56 , this type will be more efficient than UInt64

Sfixed32

Int32

Always 4 bytes

Sfixed64

Int64

Always 8 bytes

bool

bool

String

String

A string must be utf-8 encoded or 7-bit ASCII -encoded text

bytes

String

may include random order of byte data


(2) Protobuf does not support two-dimensional arrays (pointers) and does not support the serialization of STL containers

This flaw is quite large, because a slightly more complex data structure or class structure in the presence of two-dimensional arrays, two-dimensional pointers and STL containers (set, list, map, etc.) are very frequent, but because protobuf simple implementation mechanism, only support one-dimensional arrays and pointers (decorated with repeated modifier), You cannot use repeated repeated to support two-dimensional arrays or STL, so make sure that you do not have these unsupported types in your data structure before you choose this scenario.


(3) Protobuf will change the class name after nesting

PROTOBUF supports nesting of classes, that is, the ability to define a custom type in one's own definition, but note that the class name generated by the nested own definition type after PROTOBUF is not the class name you defined, but rather the outer class name as the prefix, here's a simple example:

Message DFA {    required Int32 _size = 1;     Message Accept_pair {      required bool is_accept_state = 1;      required bool Is_strict_end = 2;      Optional String app_name = 3;    }     Repeated Accept_pair accept_states = 2;}

Then the accept_pair generated in the nested class is not Accept_pair but Dfa_accept_pair. Suppose you do not want to change the class name, Accept_pair to the outside with the DFA parallel definition can be.


4.2 boost.serialization

The boost library is a very large library, very rich, and serialization is only a small branch, but in order to use the Boost serialization scheme, you need to install the entire boost library, the amount of disk space and time spent is very much, the same supported serialization function is very powerful, Both support the two-dimensional array (pointers), but also support the STL container, we do not need to use a special format to define our class structure again, its non-intrusive nature allows us to serialize without modifying the existing class structure , this is a very good property. But because of the size and complexity of the installation, assuming that it is simply a serialization, it is not necessary to use the scheme, only if the PROTOBUF does not meet your needs, it should be considered.


(1) A series of issues encountered in installing the boost library

Installing the Boost library is a very time-consuming project, assuming that a variety of errors have occurred during the period, and that it takes a lot of patience. We are able to download the boost library from the official website of the binary source code for installation, installation method please refer to the network or the following I give the reference information, below give the installation precautions:

Note 1: To use root permissions to install , or will be in the installation process error, prompt insufficient permissions.

Note the installation of the 2:boost library depends on the environment, usually Python, bzip2, and Zlib, where the packages are:

Under Ubuntu:

Zlib1g-dev
Libbz2-dev
Libpython2.7-dev (and Libpython3.3-dev)

Fedora/redhat under:
Zlib-devel
Libbz2-devel
Python-devel (and Python3-devel)

This is also the main source of error during installation.

Error 1: If the Python library is incomplete, it may be reported as " fatal error:pyconfig.h:no such file or directory compilation terminated." Error. The workaround is as follows:

Fedora System: sudo yum install Python-devel

Ubuntu System: sudo apt-get install Python-dev


Error 2: Error " libs/iostreams/src/bzip2.cpp:20:56:fatal error:bzlib.h:no such file or directory", resolution:

Fedora System: sudo yum install Bzip2-devel

Ubuntu system or Debian system: sudo apt-get install Libbz2-dev


Typically, these errors can be solved in the Ubuntu system via the sudo apt-get install Libboost-all-dev , but not necessarily.


(2) After successful installation, assuming that the installation location is not specified, the default will be installed under/usr/local/lib and/usr/local/include, Then we use the Boost library to compile with the-l and-I parameters plus the detailed lib and include paths , like this:

g++-O test boost_test.cpp-i$boost_include-l$boost_lib-lboost_serialization

Assuming that each time this is so cumbersome, you can add the Lib and include files that we want to use to the environment variables, like this:

sudo cp/usr/local/lib/libboost_serialization.*/usr/libsudo cp-r/usr/local/include/boost/usr/include

Then you can do it directly at compile time g++ -o test boost_test.cpp -lboost_serialization .

Note: Boost below has two serialized LIB files: Ibboost_serialization.lib and Libboost_wserialization.lib, so what's the difference between these two?

In fact, ' W ' means the use of wide characters, such as wchar_t.


(3) Areas where boost is unsatisfactory

    • basic type pointers are very difficult to serialize , such as int *array, which is what the official web says:
      By default, the data types designated primitive by implementation Levelclass serialization trait is never tracked. If It is desired totrack a shared primitive object through a pointer (e.g. a long used as a reference count), it should be Wrappedin a class/struct so it's an identifiable type. The alternative of changing the implementation level of a long would affect all long s serialized in the Wholeprogram-prob Ably not what one would intend. "
      That is to say, if you want to serialize a primitive type of pointer, you need to add a struct or class to make it into class type re-serialization, there is a lot of trouble, this demand is often very frequent, in view of the implementation principle of the serialization mechanism, the boost library temporarily does not support the basic type of pointer serialization very well.
    • The variable-length array (variable-sized array) cannot be serialized, and an error is reported that the variable-length array is not a template class type.


(4) Suppose you need to define an array of objects, such as defining a Class A object array with 2 elements, you must define it with a a[2] instead of the object's pointer a *a = new a[2], so that by default it is treated as a a object after serializing a, so only the value of an object can be stored. The back is not stored.


(5) The so-called boost is very human non-intrusive nature also has certain conditions: if you do not want to modify the original class, then the original class property must be public, this is very easy to explain, because you have to be able to access these properties elsewhere and define its serialization method, Of course, this also exposes the structure of the class in other places, has a certain disadvantage. This condition is often very difficult to meet, because the class attributes we define are generally private, assuming that, and still want to use non-intrusive nature, then you need to include the following declaration in the class to open access to the serialization library:

This is better than making members public.


References
    • Three common C + + serialization schemes in comparison
    • A preliminary discussion on the serialization of C + + objects
    • Official introduction: Google Protocol buffers
    • Google Protocol Buffers Chinese Course
    • Application and analysis of Protocol buffers
    • Topsy protocol Buffers
    • Build and install the Boost library on Linux platforms (Yesky)
    • Build and install the Boost library on Linux platforms (Sina blog)
    • Boost Serialization Library
    • Boost C + + library-serialization
    • Boost-Serialization (serialization)
    • Serialization of Boost-serialization

Experience with two of the most frequently used C + + serialization scenarios (PROTOBUF and boost serialization)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.