XML is a meta-language defined by the World Wideweb Alliance. It has become a universal Data Interchange format, its platform-independent, language-independent, system-independent, to the data integration and interaction brought great convenience. XML is interpreted in different languages in the same way, except that the syntax is different.
XML itself is simply a format for encoding data in plain text, to make use of XML, or to exploit the data encoded in an XML file, you must first parse the data out of plain text, so there must be a parser that can identify the information in the XML document that interprets the XML document and extracts the data from it. However, according to the different needs of data extraction, there are many analytic ways, the different analytic methods have their own advantages and disadvantages and applicable environment. Choosing the right XML parsing technology can effectively improve the overall performance of the application system.
All XML processing begins with parsing, whether using XSLT or the Java language, the first step is to read the XML file, decode the structure and retrieve the information, and so on, which is the process of translating a unstructured character sequence representing an XML document into a structured component that satisfies the XML syntax.
There are two main ways to parse XML: SAX (Simple API for XML) and Dom (Document ObjectModel).
Sax is based on the parsing of event streams. The advantages of Sax processing are very similar to the advantages of streaming media. The analysis can begin immediately, rather than waiting for all data to be processed. Also, because the application examines data only when it is being read, it does not need to store the data in memory. This is a huge advantage for large documents. In fact, the application does not even have to parse the entire document, it can stop parsing when a condition is met. In general, Sax is much faster than its surrogate dom. The SAX parser uses an event-based model that can trigger a sequence of events when parsing an XML document, and when a given tag is found, it can activate a callback method that tells the method that the label has been found. Sax requirements for memory are usually low because it lets developers decide which tag to process. Sax is a much better extension of this ability, especially when developers only need to work with some of the data contained in the document. But it is difficult to code with a SAX parser, and it is difficult to access multiple different data in the same document at the same time. Advantages: (1), does not need to wait for all data to be processed, the analysis can start immediately, (2), only when reading data to check the data, do not need to be saved in memory, (3), can be satisfied when a condition is met to stop parsing, do not have to parse the entire document, (4), high efficiency and performance, Cons: (1), requires the application itself responsible for the processing logic of the tag (such as maintaining the parent/child relationship, etc.), the more complex the document the more complex the program, (2), one-way navigation, unable to locate the document hierarchy, it is difficult to access different parts of the same document data, does not support XPath.
DOM is the official standard for representing XML documents in a platform-and language-neutral way. The DOM is a collection of nodes or pieces of information that are organized in a hierarchical structure. This hierarchy allows developers to look for specific information in the tree. Analyzing the structure usually requires loading the entire document and constructing the hierarchy before any work can be done. Because it is based on the information hierarchy, the DOM is considered to be tree-based or object-based. Advantages: (1), allow the application to make changes to the data and structure, (2), Access is bidirectional, you can navigate up and down the tree at any time, get and manipulate any part of the data. Disadvantage: It is often necessary to load the entire XML document to construct the hierarchy, consuming resources.
The XML parsing library based on C + + language includes:
(1), expat:http://www.libexpat.org/;
(2), die-xml:https://code.google.com/p/die-xml/;
(3), xerces-c++:http://xerces.apache.org/xerces-c/index.html;
(4), tinyxml:http://www.grinninglizard.com/tinyxml/;
Compilation and use of xerces-c++:
1, download Xerces-c-3.1.1.zip source code from Http://xerces.apache.org/xerces-c/download.cgi#verify, and unzip;
2, open the Xerces-c-3.1.1\projects\win32\vc10\xerces-all directory with vs2010 Xerces-all.sln;
3, select SolutionConfigurations, solution platforms related items, and then select Solution ' Xerces-all ',--> right-click, select Execute Rebuild solution, will be in/ Generate the corresponding dynamic and static libraries under the BUILD/WIN32/VC10 directory, where you select the static Debug/xerces-c_static_3d.lib and statics Release/xerces-c_static_3.lib for testing ;
4, in the ' Xerces-all ' work space based on a new Testxerces project, selected this project, respectively, in the debug and release, engineering properties (1), Configuration properties-->character Set:use Unicode Character Set; (2), c/c++-->general-->additional Include directories:.. /.. /.. /.. /.. /SRC, add to C + +-->prerocessor:
_crt_secure_no_deprecate_windowsxerces_static_libraryxerces_building_libraryxerces_use_transcoder_ Windowsxerces_use_msgloader_inmemoryxerces_use_netaccessor_winsockxerces_use_filemgr_windowsxerces_use_ Mutexmgr_windowsxerces_path_delimiter_backslashhave_stricmphave_strnicmphave_limits_hhave_sys_timeb_hhave_ ftimehave_wcsuprhave_wcslwrhave_wcsicmphave_wcsnicmp
StdAfx.h:
#pragma once#include "targetver.h" #include <stdio.h> #include "xercesc/util/platformutils.hpp" #include " Xercesc/util/xmlstring.hpp "#include" xercesc/dom/dom.hpp "#include" xercesc/util/outofmemoryexception.hpp "# Include "Xercesc/util/transservice.hpp" #include "xercesc/parsers/saxparser.hpp" #include "xercesc/sax/ Handlerbase.hpp "#include" xercesc/framework/xmlformatter.hpp "
Stdafx.cpp:
#include "stdafx.h"//todo:reference any additional headers the need in stdafx. h//and not in this file#ifdef _debug#pragma comment (lib, ". /.. /.. /.. /.. /build/win32/vc10/static debug/xerces-c_static_3d.lib ") #else #pragma comment (lib," ... /.. /.. /.. /.. /build/win32/vc10/static release/xerces-c_static_3.lib ") #endif
TestXerces.cpp:
#include "stdafx.h" #include <iostream>using namespace std; Xerces_cpp_namespace_useclass xstr{public://------------------------------------------------------------------- ----//Constructors and destructor//-----------------------------------------------------------------------XStr ( Const char* Const Totranscode) {//Call the private transcoding methodfunicodeform = Xmlstring::transcode (Totranscode);} ~xstr () {xmlstring::release (&funicodeform);} -----------------------------------------------------------------------//Getter methods//-------------------- ---------------------------------------------------Const xmlch* unicodeform () Const{return funicodeform;} Private://-----------------------------------------------------------------------//Private Data members//// funicodeform//This is the Unicode xmlch format of the string.//---------------------------------------------------- -------------------xmlch* Funicodeform;}; #define X (str) XStr (str). Unicodeform ()/** ThiS sample illustrates how can I create a DOM tree in memory.* It and prints the count of elements in the Tree.*/int Crea Tedomdocument () {//Initialize the xml4c2 system.try {xmlplatformutils::initialize ();} catch (const xmlexception& Tocatch) {Char *pmsg = Xmlstring::transcode (Tocatch.getmessage ()); Xerces_std_qualifier cerr << "Error during xerces-c initialization.\n" << "Exception message:" << PMSG; Xmlstring::release (&pmsg); return 1;} Watch for special case help requestint errorCode = 0;/*{xerces_std_qualifier cout << "\nusage:\n" "Createdomdo Cument\n\n "Creates a new DOM document from scratch in memory.\n" "It then prints the count of elements in the tree.\n "<< xerces_std_qualifier endl;errorcode = 1;} */if (ErrorCode) {xmlplatformutils::terminate (); return errorCode;} {//Nest entire test in an inner block.//the tree we create below are the same that the Xercesdomparser would//has CR Eated, except that no whitespace text nodes WOuld be created.//<company>//<product>xerces-c</product>//<category idea= ' great ' >XML Pa rsing tools</category>//<developedby>apache Software foundation</developedby>//</company> domimplementation* Impl = domimplementationregistry::getdomimplementation (X ("Core")), if (impl! = NULL) {try { domdocument* doc = impl->createdocument (0,//root element namespace URI. X ("Company"),//root element NAME0); Document type Object (DTD). domelement* Rootelem = doc->getdocumentelement ();D omelement* Prodelem = doc->createelement (X ("Product")); Rootelem->appendchild (Prodelem);D omtext* proddataval = Doc->createtextnode (X ("Xerces-c"));p rodelem-> AppendChild (proddataval);D omelement* Catelem = doc->createelement (X ("category")); Rootelem->appendchild ( Catelem) Catelem->setattribute (x ("idea"), X ("Great"));D omtext* catdataval = Doc->createtextnode (x ("XML Parsing Tools ")); Catelem->appendchild (catdataval);D omelement* Devbyelem = doc->createelement (X ("Developedby")); rootElem- >appendchild (Devbyelem);D omtext* devbydataval = Doc->createtextnode (X ("Apache software Foundation"); Devbyelem->appendchild (devbydataval);////now count the number of elements in the above DOM tree.//const xmlsize_t elem Entcount = Doc->getelementsbytagname (X ("*"))->getlength (); Xerces_std_qualifier cout << "The tree just created contains:" << elementcount<< "elements." << Xerces_std_qualifier endl;doc->release ();} catch (const outofmemoryexception&) {xerces_std_qualifier cerr << "OutOfMemoryException" << xerces_std _qualifier endl;errorcode = 5;} catch (const domexception& e) {xerces_std_qualifier cerr << "domexception code is:" << e.code << X Erces_std_qualifier endl;errorcode = 2;} catch (...) {xerces_std_qualifier Cerr << "An error occurred creating the document" << Xerces_std_QUALIFIER endl;errorcode = 3;}} else{//(INPL! = NULL) xerces_std_qualifier Cerr << "requested implementation is not supported" << xerces_std_q Ualifier endl;errorcode = 4;}} Xmlplatformutils::terminate (); return errorCode;} ---------------------------------------------------------------------------//This is a easy class that lets us do E Asy (though not terribly efficient)//transcoding of Xmlch data to local code page for display.//----------------------- ----------------------------------------------------class Strx{public://--------------------------------------- --------------------------------//Constructors and destructor//------------------------------------------------- ----------------------Strx (const xmlch* const totranscode) {//Call the private transcoding methodflocalform = xmlstring: : Transcode (Totranscode);} ~strx () {xmlstring::release (&flocalform);} -----------------------------------------------------------------------//Getter methods//-----------------------------------------------------------------------Const char* localform () Const{return flocalform;} Private://-----------------------------------------------------------------------//Private Data members//// flocalform//This is the local code page form of the string.//------------------------------------------------------ -----------------char* Flocalform;}; Inline Xerces_std_qualifier ostream& operator<< (xerces_std_qualifier ostream& target, const StrX& Todump) {target << todump.localform (); return target;} int Saxprint () {//---------------------------------------------------------------------------//Local data//// donamespaces//indicates whether namespace processing should be enabled or not.//Defaults to disabled.////DoS chema//indicates whether schema processing should be enabled or not.//Defaults to disabled.////Schemafullche cking//indicates whether full schema constraint checking should BES enabled or not.//Defaults to disabled.////encodingname//The encoding we is to output in. If not set on the command line,//then it's defaulted to latin1.////xmlfile//the path to the file to parser. Set via command line.////valscheme//indicates what the validation scheme to use. It defaults to ' auto ', but//can is set via the-v= command.//------------------------------------------------------ ---------------------static BOOL donamespaces = false;static bool Doschema = false;static BOOL schemafullchecking = false;static Const char* Encodingnam E = "LATIN1"; static xmlformatter::unrepflags unrepflags = Xmlformatter::unrep_charref;static char* XMLFile = 0;static Saxparser::valschemes valscheme = saxparser::val_auto;//Initialize the xml4c2 sys Temtry {xmlplatformutils::initialize ();} catch (const xmlexception& tocatch) {XERCES_STD_qualifier cerr << "Error during initialization!: \ N "<< Strx (Tocatch.getmessage ()) << xerces_std_qualifier Endl;return 1;} XMLFile = ". /.. /.. /.. /.. /samples/data/personal-schema.xml "; int errorcount = 0;////Create a SAX parser object. Then, according to do we were told on//the command line, set it to validate or not.//saxparser* parser = new SAXParser ;p arser->setvalidationscheme (valscheme);p arser->setdonamespaces (donamespaces);p Arser->setdoschema ( Doschema);p Arser->sethandlemultipleimports (True);p arser->setvalidationschemafullchecking ( schemafullchecking)////Create The handler object and install it as the document and error//handler for the Parser-> ; Then parse the file and catch any exceptions//that propogate out//int ErrorCode = 0;try {//saxprinthandlers handler (Enco Dingname, Unrepflags);//parser->setdocumenthandler (&handler);//parser->seterrorhandler (&handler); Parser->parse (xmlfile); errorcount = Parser->geterrorcount ();} catch (const outofmemoryexception&) {xerces_std_qualifier cerr << "OutOfMemoryException" << xerces_std _qualifier endl;errorcode = 5;} catch (const xmlexception& tocatch) {xerces_std_qualifier cerr << "\nan error occurred\n Error:" << Strx ( Tocatch.getmessage ()) << "\ n" << xerces_std_qualifier endl;errorcode = 4;} if (ErrorCode) {xmlplatformutils::terminate (); return errorCode;} Delete the parser itself. Must be do prior to calling Terminate, Below.//delete parser;//and call the termination Methodxmlplatformutils::termina Te (); if (Errorcount > 0) return 4;elsereturn 0;return 0;} int main (int argc, char* argv[]) {createdomdocument (); Saxprint ();cout<< "ok!" <<endl;return 0;}
Introduction to XML parsing and xerces-c++ simple use example