The C ++ Source
An Introduction to XML
Data Binding in C ++
By Boris kolpackov
May 4, 2007
Original article: http://www.artima.com/cppsource/xml_data_binding.html
A c ++ application needs to process data in XML format. Generally, there are two APIs for accessing XML: Document Object Model (DOM) or simple xml api. (SAX ). Dom describes XML as a tree-like Data Structure for program traversal and access. Sax is an event-driven xml api for parsing. An application registers events of interest, such as element nodes, attributes, and text. When parsing XML, these events registered by the application are triggered.
Dom first reads the entire XML document into the memory, and then parses the data, while sax transmits the data during the parsing process.
Whether using Dom or sax, it is not easy to process a large amount of XML data. After all, Dom and sax are both XML structures and operations on elements, attributes, and text are expressed in memory. Programmers have to write a lot of code and translate the XML code into the logic that can be used by the program. For example, the following is a simple XML document that describes a person.
<Person>
<Name> John Doe </Name>
<Gender> male </gender>
<Age> 32 </age>
</Person>
If we want to ensure that the person's age is greater than a threshold value, we must first find the "Age" element, whether using Dom or sax, then convert the text "32" to the int type. In this case, we can do the comparison. Another notable disadvantage of generic APIS is the control of string streams. In the above example, to find the "Age" element, we need to first parse the name element in sequence. If the element is misspelled, we can only find this bug at runtime. The code of the string stream also affects the readability and maintainability of the Code. In addition, generic APIs lack the type security mechanism because all types are represented as text types. For example, we can compare gender elements with anything else without the compiler giving any warning.
Domelement * Gender =...
If (gender-> gettextcontent () = "man ")
{
...
}
In recent years, a new method has emerged to process XML. It is called XML data binding. The emergence of XML data binding technology has benefited from the development of XML Semantic Description Language (XML shemas. The core idea of XML data binding technology is to resolve data to an object (instead of memory ). In this way, programmers do not need to process the underlying XML operations because these objects can be directly identified and executed by program logic. Let's take the preceding example as an example. Instead of searching for a text string indicating the age, we convert it into an int manually. Instead, we simply call the age () method of the person object, the age method returns an int. The so-called "binding" means that the "object" must be, but can only be used to express XML data blocks.
The object and the code used to operate the object, such as the parsing and serialization functions, are generated by the data binding tool. The data binding tool comes from the XML schema. Schema is a syntax specification that specifies the name, attribute, content, and structural relationship between elements. Most data binding tools use W3C XML Schema specifications. This is because it is widely used. Second, because of its good object-oriented features. The following code describes the person file. The W3C XML Schema Specification is used.
<Xs: schema xmlns: xs = "http://www.w3.org/2001/XMLSchema">
<Xs: simpletype name = "gender_t">
<Xs: Restriction base = "XS: string">
<Xs: enumeration value = "male"/>
<Xs: enumeration value = "female"/>
</Xs: Restriction>
</Xs: simpletype>
<Xs: complextype name = "person_t">
<Xs: sequence>
<Xs: element name = "name" type = "XS: string"/>
<Xs: element name = "gender" type = "gender_t"/>
<Xs: element name = "Age" type = "XS: short"/>
</Xs: sequence>
</Xs: complextype>
<Xs: element name = "person" type = "person_t"/>
</Xs: schema>
Even if you are not familiar with XML schema, you can understand what the above Code has done. Gender_t is an enumeration type. There are two possible values: male or female. Person_t is defined as a sequence in which the name, gender, and age elements are nested. Note: The term "sequence" means that the in element appears in the specified order in XML schema. A particle order as opposed to appearing multiple times. Finally, the global definition of person indicates the root element of the struct. For more information about XML schema, see W3C specification: XML Schema part0: Primer.
Like other APIs that directly Parse XML, XML data binding also supports the memory mode and event-driven mode. In the next section, we will compare the performance of XML data binding technology with Dom and sax. The Dom and sax examples used in this example are a C ++ open source XML Parser xerces Based on Apache. For simplicity, data types are not identified.
Memory-based XML binding.
In-memory XML Data Binding
Based on an XML schema, a Data Binding compiler generates C ++ classes that represent the given vocabulary as a tree-like in-memory data structure as well as parsing and serialization functions. the parsing functions are responsible for creating the in-memory representation from an XML instance while the serialization functions save the in-memory representation back to XML. for the schema presented in the introduction, a Data Binding compiler cocould generate the following code:
Using XML schema, the data binding tool generates C ++ classes. These classes represent XML data in the tree structure in the memory. And some parsing and serialization functions. The parsing function extracts data from the XML instance and expresses it as a memory object, while the serialization function restores the memory object to XML. Based on the schema given above, the data binding tool generates the following code:
Class gender_t
{
Public:
Enum value {male, female };
Gender_t (value );
Operator value () const;
PRIVATE:
...
};
Class person_t
{
Public:
Person_t (const string & name,
Gender_t gender,
Short age );
// Name
//
String & name ();
Const string & name () const;
Void name (const string &);
// Gender
//
Gender_t & Gender ();
Gender_t gender () const;
Void gender (gender_t );
// Age
//
Short & age ();
Short age () const;
Void age (short );
PRIVATE:
...
};
STD: auto_ptr <person_t> person (STD: istream &);
Void person (STD: ostream &, const person_t &);
By comparing the code and XML Schema Declaration, we can see that the schema compilation tool maps schema types to classes in C ++ and maps local elements to a group of operations, maps global elements into a group of resolution and serialization functions.
Next, we will look into three common XML processing tasks. Use Dom to bind XML data. These three tasks are: Accessing XML data, modifying existing data, and creating new data. Through these experiments, we will prove the superiority of XML data binding over Dom.
The following code uses the data binding technology to read the information of a person stored in an XML file. If the name of the person is greater than 30, the name is printed. For simplicity, error handling is not considered.
Ifstream ifs ("person. xml ");
Auto_ptr <person_t> P = person (IFS );
If (p-> Age ()> 30)
Cerr <p-> name () <Endl;
The example above is simple and clear. Once the XML file is read into the memory, the Code no longer depends on XML. The code works seamlessly with other object models in the program. Note that the data displayed in C ++ class is static.
The following code uses Dom to complete the same task.
Ifstream ifs ("person. xml ");
Domdocument * Doc = read_dom (IFS );
Domelement * P = doc-> getdocumentelement ();
String name;
Short age;
For (domnode * n = p-> getfirstchild ();
N! = 0;
N = N-> getnextsibling ())
{
If (n-> getnodetype ()! = Domnode: element_node)
Continue;
String el_name = N-> getnodename ();
Domnode * text = N-> getfirstchild ();
If (el_name = "name ")
{
Name = text-> getnodevalue ();
}
Else if (el_name = "Age ")
{
Istringstream ISS (text-> getnodevalue ());
ISS> age;
}
}
If (age> 30)
Cerr <name <Endl;
Doc-> release ();