Simple analysis of C + + serialization

Last Update:2016-08-14 Source: Internet

Author: User

Tags comparison table soap object serialization

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. What is the serialization of a program?

When we write an application, we often need to store some of the program's data in memory and then write it to a file or transfer it to another computer on the network for communication. The process of converting program data into a format that can be stored and transmitted is called "serialization" (serialization), and its inverse process can be referred to as "deserialization" (deserialization).

The serialization of a C + + program is the process of converting the state information of an object into a form that can be stored or transmitted. During serialization, an object writes its current state to a temporary or persistent store. Later, the object can be recreated by reading or deserializing the state of the object from the store.

In simple terms, serialization is the process of translating the state of an object instance into a format that can be persisted or transmitted. Relative to serialization is deserialization, which reconstructs an object based on a stream. Together, these two processes make it easy to store and transfer data. For example, you can serialize an object and then use HTTP to transfer the object between the client and the server over the Internet.

2. Why use serialization?

First, by serializing, you can keep the state of the object in the storage media and later recreate the exact copy. We often need to save the object's field values to disk and retrieve this data at a later time. Although this can be done without serialization, this approach is often cumbersome and error-prone, and becomes more complex when you need to track the hierarchy of objects. Imagine writing a large business application with a large number of objects, and programmers have to write code for each object to save fields and properties to disk and restore those fields and properties from disk. Serialization provides a quick way to easily achieve this goal.

Second, by serializing, you can pass objects from one place to another, and generally, objects are only valid in the application domain in which the object is created. However, serialization can send an object from one application domain to another in a different application domain by value. For example, serialization can be used to save session state in ASP. and copy objects to the clipboard of a Windows Form. One of the most important purposes of serialization is to transfer objects over the network.

3. What situations require serialization?

Persist the custom object in some form of storage.

4. Which methods can be used for serialization?

Prior to the advent of a systematic serialization method, the programmer would like to persist the objects of a custom class and transfer them, using the following methods:

(1). The programmer realizes the function of saving the object data, writing code for each object and storing its data.

(2). Cast the object to char* or void* type data, and then transfer the data.

The advantages of serialization relative to the above two methods are compared in terms of versatility, ease of use, flexibility, and portability.

5. What are the advantages of serialization?

(1). Versatility:

If the programmer itself implements the function of saving object data, then for each class of objects, programmers have to write different code, the work is very large, not high versatility. Serialization provides a process approach, which is generally consistent with each category, improving the commonality of the code.

If you cast an object to a char* or void* type of data for transmission, you must know beforehand the size of the object to allocate the space of the array in advance. However, if there is a variable-length data structure in the object, it is not possible to accurately understand the size of the object data, only a pre-estimate. If the estimate is small, it can cause space overflow, the consequence of the program crash, and if the estimate is large, it will cause the waste of space. However, the problem of variable-length data structures can be well solved if the serialization method is used.

(2). Ease of Access:

If the programmer itself implements the function of saving object data, then for the different data structures in the class, programmers should write the corresponding save code, simple data structure is good to say, if it is a multi-layered data structure, code writing will become more and more complex, so cumbersome and error prone. Serialization provides a way to persist the data types for simple data types, as well as string types, STL containers, pointers, and so on, simply by invoking them, with great ease.

(3). Flexibility:

Serialization provides several formats for persisting object data, such as saving in a simple text format, saving in XML format, saving in SOAP format, saving in binary format, and so on. It also provides a variety of ways to save persisted objects, such as saving to a string, saving to a file, etc., with great flexibility.

(4). Portability

With the method of casting an object to the char* type for transmission, you need to be aware of the problem with the CPU byte order. If the starting machine is different from the CPU byte order of the destination machine, it will cause the data that the destination machine reads cannot be restored to the original object. While it is possible to solve this problem by converting the local byte order into a network byte order and then transferring the network byte order to the local byte order, it increases the complexity of the programmer's consideration of the problem. Serialization masks the difference in byte order, making the transmission of persisted objects more portable.

Also, using serialization can be a good way to cross a platform.

6. Example: Our requirements 3.1 performance testing of the OTT-based database structure

When performing performance tests on a program that uses an OTT-based database structure, because the PNR data being read is a document in XML format, it takes a lot of time to read the XML file into memory, turn it into a DOM tree, and then convert the data in the DOM tree into the object structure required by the OTT database. If this part of the time in the program's performance time, will result in the performance of the test has a large error. Therefore, the best way to do this is to convert the PNR data in XML format into an object that is available to the program, and read the object directly when the program is running . This allows the time to parse the PNR data in XML format to be separated from the time the program runs, thus guaranteeing the accuracy of the performance test. To save the PNR data to a program-usable object is the process of serializing an object, and the program reads the file that holds the object and restores it to its original object, which is the process of deserializing an object.

7. Data transfer only with a specific type

In some cases, because of constraints, the transmission of data can only use a certain type. For example, when using tuxedo, only the char* type can be used to transmit data from the client to the server, for example, when passing data using shared memory, only in contiguous array form. In these cases, a challenge is encountered if the transmitted data is an object of a custom class. One approach is to force the object to be converted directly to the qualified type, to the destination and then to the original type by the qualified type. This approach should be the fastest in terms of performance, but using this method requires a clear knowledge of the length of the outgoing data, so it is not convenient to send variable-length data. In addition, it has cross-platform compatibility issues. Another approach is to use the object serialization method to save the object as a byte stream, transfer it to the destination, and then deserialize the object to the custom class at the destination. This approach is relatively generic, secure, and prescriptive, but may not be as good as the previous method.

8. There are several ways to serialize an object using C + +:

There are three ways to serialize objects using C + +: A method based on the boost library, a. Net framework-based approach, and an MFC-based approach. This chapter describes the implementation mechanisms, implementation steps, and considerations for the three methods.

Because our development environment is under windows, the deployment environment is under UNIX, so our development needs to use a technology that is compatible with two platforms. Validated, based on. NET and MFC-based methods apply only to Windows environments, and the boost library has versions under both Windows and UNIX, so you should prioritize object serialization using the Boost library in your project. However, the use is still listed in this article. NET and MFC are serialized for reference. Three methods The corresponding code implementation examples will be appended to the article.

8.1 Using the Boost library 4.1.1 implementation mechanism

Here, we use the term serialization (serialization) to represent the purpose of representing a set of raw C + + data structures as byte streams to achieve reversible destruction. Such a system can be used to re-establish the original data structure in another program environment. As a result, it can also be used as Object persistence (objects persistence), remote parameter passing (remotely parameter passing), or other characteristics of the implementation basis. In our system, the term file (archive) is used to represent a specific stream of bytes. The file can be a binary file, a text file, an XML file, or other user-defined type.

The goals of the Boost serialization library are:

(1). Portability of code – relies only on ANSI C + + features.

(2). Code Economics – Mining various C + + features such as rtti, templates, and multiple inheritance make it easy for users and short code.

(3). The independence of the class version. – When the definition of a class changes, the old version of the class file can still be imported into the new version of the class.

(4). Deep storage and recovery of pointers. – Saves or restores the data that the pointer points to while saving or restoring the pointer.

(5). Correct handling of problems when multiple pointers point to the same object.

(6). Direct support for serialization of STL and other common template classes.

(7). Data portability – The creation of a byte stream on one platform should also be correct on the other.

(8). Serialization and the orthogonality of the file format – you can apply any format file as an archive without changing the serialization portion of the class.

(9). Supports non-intrusive (non-intrusive) implementations. The class does not need to derive from a particular class or implement a specific member function. This is necessary for situations where we cannot or do not want to modify the definition of a class.

(10). The file's interface should be simple enough to make it easy to create new types of files.

(11). The file should support XML format.

In boost, the two libraries associated with serialization are the archive library and the serialization library.

8.1.2 Implementation Steps

First, a corresponding serialize (Archive & AR, const unsigned int version) method is implemented for the serialized class;

Second, constructs an object of the Boost::archive::text_oarchive class or other archive output class and associates it to an output stream, using the << operator to output the serialized object to a document;

Finally, you construct an object of the Boost::archive::text_iarchive class or other archive input class and associate it to an input stream, read into the data, and use the >> operator to pay for the object being serialized.

8.1.3 Precautions

The use of this method should be noted:

(1). Boost supports serialization only after version 1.32, so be sure to use the version after 1.32;

(2). The serialization library in boost will need to be compiled to be used by the library file and added to the project's additional dependencies;

(3). Include some header files under Boost/serialization and boost/archive as needed.

8.2 use. NET4.2.1 Implementation Mechanism

. NET is used by the runtime environment to support the mechanism of user-defined type fluidization. In this procedure, you convert the public and private fields of the object and the name of the class (including the assembly where the class resides) to a byte stream, and then write the byte stream to the data stream. When the object is subsequently deserialized, an identical copy of the original object is created.

. NET Framework provides very good support for serialization mechanisms, It provides two namespaces (namespace): System.Runtime.Serialization and System.Runtime.Serialization.Formatters to complete most of the functionality of the serialization mechanism.

The implementation of the serialization mechanism is accomplished by relying on the formatter (Formatter), which is an object of a class inherited from the System.Runtime.Serialization.IFormatter. The formatter completes the work of transforming the program data into a format that can be stored and transmitted, and also completes the work of converting the data back. NET Framework provides programmers with two types of formatters, one that is typically applied to desktop-type applications, One is an object of the System.Runtime.Serialization.Formatters.Binary.BinaryFormatter class, and the other is more largely applied to areas such as. Net Remoting and XML Web services. It is an object of the System.Runtime.Serialization.Formatters.Soap.SoapFormatter class. From their names, it might be useful to refer to them as binary formatters and XML formatters, respectively. they correspond to. NET provides two kinds of serialization technologies:

Binary serialization preserves the type fidelity, which is useful for preserving the state of an object between different invocations of the application. For example, by serializing objects to the Clipboard, you can share objects between different applications, and you can serialize objects to streams, disks, memory, networks, and so on. Its advantage is that all object members can be saved and performance is better than XML serialization.

XML serialization only serializes public properties and fields, and does not maintain type fidelity. This is useful when you want to provide or use data without restricting the applications that use that data. Because XML is an open standard, this is a good choice for sharing data over the WEB. SOAP is also an open standard, which makes it an attractive option as well. It has the advantage of good interoperability and readability.

8.2.2 Implementation Steps

Use. NET binary serialization method for object serialization is as follows:

First, you use the Serializable property to tag the class of the object;

Secondly, the object is written to a file stream using BinaryFormatter's Serialize method;

Finally, the deserialize method of BinaryFormatter is used to read the file stream and restore the object.

8.2.3 Precautions

The use of this method should be noted:

(1). You need to use the System::runtime::serialization::formatters::binary namespace and the System::runtime::serialization namespace;

(2). The serialized class must identify the [Serializable] attribute when declaring;

(3). The class involved must be a managed class, that is, the class declaration requires the REF keyword, the gcnew keyword is used to allocate memory on the managed heap, the pointer symbol is marked with a ^, and so on.

8.3 Using the MFC4.3.1 implementation mechanism

The serialization of an object is ultimately the process of writing the object's data to the carrier and then re-reading it as an object. MFC has a very good support for reading and writing data, which makes it very convenient to use MFC's data read-write class to realize object serialization.

MFC has designed three basic classes--cfile (CFile Class), CStdioFile (standard I/O file Class), CArchive (CArchive Class) for data reading and writing. The standard CStdioFile class provides the ability to stream files equivalent to C, which can be opened in text or binary mode and can be buffered. The CFile class provides a non-buffered binary input and output file, which can be used in conjunction with the CArchive class to achieve the visualc++ design of common file serialization, but also by the designer's own custom storage scheme to achieve data read and write operations (this method of compatibility problems need to be resolved, strong confidentiality). CArchive class is the most commonly used method of file processing in visualc++ programming, CArchive class can not only realize the reading and writing operation of simple data structure, but also can realize the reading and writing of complex data structure by derivation of Cobiect class, thus, using CArchive class, The serialization of arbitrary data structures can be easily implemented.

8.3.2 Implementation Steps

Classes that implement serialization need to meet a range of conditions:

(1). The class needs to derive from the CObject class (which can be derived indirectly);

(2). DECLARE_SERIAL macro definition in the class;

(3). The class has a default constructor;

(4). The Serialize (carchive&) function is implemented in the class, and the serialization function of the base class is called;

(5). Use the IMPLEMENT_SERIAL macro to indicate the class name and version number.

Once these conditions are met, it is possible to serialize and deserialize them.

When serializing, first, instantiate an object of the CArchive class, associate it with the output file, and second, use the << operator overloads of the CArchive class to save the object that needs to be serialized in a file.

When deserializing, associate the object of the CArchive class with the file that holds the object, and then create a new object that needs to be deserialized, using the >> operator overloads of the CArchive class to restore the contents of the file to the object that needs to be deserialized.

8.3.3 Precautions

The use of this method should be noted:

(1). Need to include afx.h header file;

(2). It does not support serialization of string types, but supports serialization of CString types;

(3). You need to configure the MFC properties in the project properties to "use MFC in a shared DLL" or "use MFC in a static library" or compile with an error.

9. Key technologies for object serialization using the Boost Library 5.1 basics:

1. Basic types of archiving and reading

to the base type. You can complete the archive or read directly using the following statement:

(1). Use AR << data or AR &data; Write archive

(2). Use AR >> data or AR &data; Remove from archive

2. Archiving and reading of custom types

For the custom type. The serialize () function is called, and the Serialize function is used to "store/load" its data members. This processing takes the form of recursion until all the data contained in the class is "stored/loaded".

(1). Intrusive: T.serialize (ar,version)

(2). Non-intrusive: Serialize (ar,t, version)

3. The header file to be included:

(1). Implementing archives in simple Text format: text_oarchive and Text_iarchive

(2). Text format with wide characters archive: text_woarchivetext_wiarchive

(3). XML Archive: xml_oarchivexml_iarchive

(4). XML document with wide character (for utf-8) output: xml_woarchive xml_wiarchive

(5). Binary archive (note that the binary archive is not portable): Binary_oarchive binary_iarchive

9.2 Intrusive and non-intrusive

For the class being serialized, there are two ways to implement its corresponding serialize method, one is intrusive, that is, the Serialize method is implemented as a member method of the serialized class, and the other is non-intrusive, and the Serialize method is placed under another namespace. Implemented as a friend method of the serialized class. In cases where the code of the serialized class cannot be modified, the non-intrusive approach should be used.

An intrusive example:

Class MyPoint

{

INTMX;
int MY;

Private
Friend class Boost::serialization::access; Intrusive version to add this.

//Both deposit and read use the serialize () function below .
The Archive is an input or output document. When entering & for >>. When the output & for <<.
Template<class archive>
void Serialize (archive& ar, const unsigned int version)
{
Ar &mX; Serializing data members
AR & MY;
}

Public
MyPoint () {}
MyPoint (int x, int y): MX (x), MY (y) {}
};

Non-intrusive examples:

Class MyPoint
{
Private

Note the keyword "friend" and one more class reference as arguments

Template<class archive>

Friend Voidserialize (archive& ar, mypoint&, unsigned int const);

INTMX;
int MY;
Public
MyPoint () {}
MyPoint (int x, int y): MX (x), MY (y) {}
};
Non-intrusive
namespace boost{// Implementation put it under the name space
Namespace Serialization {

Template<classarchive>
void Serialize (Archive & AR, mypoint& p, const usigned int version)
{
Ar & p.mx & p.my; Can be attached &
}

}
}//namespace End

9.3 Serialization of derived classes

Serializing a derived class requires the premise that its parent class must also implement the Serialize method or serialize it. If the parent class of the derived class does not implement the Serialize method, only the derived class is serialized, and the derived class cannot save the data information inherited from the parent class, but only the data that belongs to the derived class itself.

The steps to serialize a derived class are:

1, including boost/serialization/base_object.hpp header file;

2, in the Serialize template method, using ar& boost::serialization::base_object< parent class > (*this) Such syntax to save the parent class of data, The Serialize function of the parent class cannot be called directly.

An example is as follows:

#include <boost/serialization/base_object.hpp>//Be sure to include this header file

Class B:a

{

Friend Classboost::serialization::access;

char c;

Template<classarchive>

void Serialize (archive& ar, const unsigned int version)

{

ar& boost::serialization::base_object<a> (*this); Watch this.

AR & c;

}

Public

...

};

9.4 Serialization of arrays

The serialization of an array is the preservation of each data member in the array, so it is equivalent to serializing each data member in the set. Can be in the following form:

for (int i = 0; i < sizeof (array); i++)

{

AR & Array[i];

}

But in fact, the Boost serialization library can detect that the serialized object is an array, which will produce the equivalent code, as in the following example:

Class Bus_route

{

Friend Classboost::serialization::access;

Bus_stop STOPS[10];

Template<classarchive>

void Serialize (archive& ar, const unsigned int version)

{

AR & stops;

}

Public

Bus_route () {}

};

9.5 Serialization of pointers

Serializing the entire object is required to reconstruct the original data structure in another place and time. In the case of using pointers, in order to reconstruct the original data structure, it is not enough to store the value of the pointer, and the object pointed to by the pointer must also be stored. When the member is finally loaded, a new object is created, and a new pointer to the new object is loaded into the member of the class.

All of this is done automatically by the Boost serialization library, and the programmer simply serializes the pointer directly. (So to speak, use caution because the examples are not tuned.) ) An example is as follows:

Class bus_route{friend Classboost::serialization::access; Bus_stop *STOPS[10]; Template<class archive> void serialize (Archive & AR, const unsigned int version) {inti; for (i = 0; i < 10;++i) ar &stops[i]; }public:bus_route () {}};5.6 Serialization of STL containers

For STL containers, such as vectors or lists, you need to include <boost/serialization/vector.hpp> or <boost/serialization/list.hpp> in the header file, It can then be serialized directly. An example is as follows:

#include <boost/serialization/list.hpp>class bus_route{friendclass boost::serialization::access; Std::list<bus_stop*> stops; Template<class archive> void serialize (Archive & AR, const unsigned int version) {AR & stops; }public:bus_route () {}};5.7 The member of the class being serialized is an object of another class

If the serialized class has members that are objects of other classes, the class can be serialized only if the class of its object member implements the Serialize method and can be serialized.

For example, in the first few examples, a member of Class Bus_route is an object of the Bus_stop class. The Bus_route class can then be serialized only if the Bus_stop class implements the Serialize method.

9.6 Output

The Boost serialization library can be output in three formats: Simple text format, XML format, and binary format. Each of these formats can be exported to the Ostream stream of C + +, for example, Ostringstream (string output stream), Ofstream (file output stream). The following example is an example of exporting to a string stream in a simple text format.

serialization, output to String

Std::ostringstream ossout (ostringstream::out); Writes the object to the string output stream

Boost::archive::text_oarchive OA (ossout);

TestClass Objtestclass;

OA << Objtestclass;

String Strtrans = Ossout.str ();

......

deserialization, input from string

Istringstreamossin (Strtrans); Reading data from the string input stream

Boost::archive::text_iarchive ia (ossin);

TestClass Newobjtestclass;

IA >> Newobjtestclass;

Conclusion:

1, in the performance test of the database structure based on the OTT structure, we define a corresponding class for each table in the database, and our goal is to serialize the objects of these classes. However, when trying to serialize, you encounter a problem: All OTT table classes inherit from a class defined by the Oracle library file Oracle::occi::P object. The serialization of a derived class requires that its parent class also implement a serialization interface, otherwise the members of the parent class inheriting from the derived class are lost when serializing (see Section 5.3). This requires modifying the library file, which is the Pobject also implements the serialization interface. However, the hasty modification of the library file may lead to a chain reaction, causing errors in other programs that reference the library file, as well as intellectual property issues. Therefore, the way to serialize the classes of OTT tables using the Boost serialization library may not be able to get through. Other methods should be considered.

2, when using shared memory to pass object data, the object data can be serialized in a simple text format, and then Ostringstream stream output into a string, to be passed, is completely feasible.

Attached: Example of a program comparison table

L Cplusserializeboost: Serialization using Boost's serialization library;

L Cplusserializedotnet: use. NET for serialization;

L CPLUSSERIALIZEMFC: Use MFC for serialization.

Simple analysis of C + + serialization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More