Like other databases, the DB2®V8 XML Extender provides two models for storing and accessing XML: XML documents can be stored in CLOB columns as unresolved text, or mapped and decomposed into a set of relational tables. Both of these options have known performance limitations. The new purexml™ technology in DB2®9 attempts to eliminate these limitations by storing and querying XML in its intrinsic hierarchical format. This article describes a series of metrics that determine whether PureXML can provide performance benefits and quantify performance differences between PureXML and Clob or exploded storage.
Brief introduction
The purexml™ technology in DB2®9 is designed to provide the highest level of performance for XML data management. This paper compares the performance of purexml™ technology with character-type large object (CLOB) and decomposed XML storage. Many database systems allow you to store XML data in CLOB format, or to "decompose" data into relational tables. DB2®V8 also supports both options (through XML Extender), and DB2 9 still provides XML Extender for backward compatibility. However, they will be replaced by PureXML characteristics.
The DB2 XML Extender includes a set of stored procedures, user-defined functions (UDF), and user-defined data types (UDT) that add XML functionality to the core DB2 engine. XML extender processes and UDF are equipped with XML parsers and XML-specific logic, enabling the implementation of XML storage and retrieval supported by the traditional DB2 engine features. You can use XML Extender XML Extender Column or XML Extender Collection attributes.
The XML Extender column allows the XML document to be stored completely as unresolved plain text. This process is simple, but ignores the internal structure of the XML document. You can choose to use CLOB columns, varchar columns, or files within the file system as the underlying storage. In the varchar column, the XML Extender can store only the largest 3KB of documents--many applications are hard to guarantee to meet this limit. A high level of database administrator (DBA) can increase this limit to 32k, but the application is generally not guaranteed to be satisfied even with this maximum value. External file storage is more flexible, but cannot benefit from the persistence and integrity of database management. This is why CLOB (which can store the largest 2GB of documents) becomes the most common choice for XML Extender column. This article explores the performance of XML Extender CLOB columns later.
XML Extender collection allows the conversion of XML data into relational format. This requires a fixed mapping from the expected XML structure to the set of relational tables in the database schema. Based on this mapping, the stored procedure extracts the atomic data values from the XML document and inserts them into the traditional relational rows and columns. This process is called decomposition ("shredding" or decomposition). It involves XML parsing and translates a single logical XML document into a series of SQL row inserts. In an actual application, it makes it easy to use dozens of relational tables to represent all the one-to-many relationships in the original XML structure. As a result, mappings quickly become complex, and XML insert performance is affected accordingly. Once data is available in relational format, pure SQL can be used for data access and manipulation. However, the refactoring of the original XML document is also very expensive. It needs to add and generate the appropriate XML tags in a multiple way. These tags can be defined by a standardized Sql/xml publishing function to refactor the original document or a new document. However, XML Extender Collection cannot preserve any digital signatures of the original XML document.
Providing XML data in relational format is still an important requirement. The most common cause is the need to feed data to legacy SQL applications, packaged business applications, and business Intelligence (BI) tools that use only relational data. Therefore, DB2 9 provides a new "decomposition" solution, also known as annotation mode decomposition or "new decomposition", which is 7 to 8 times times the speed of XML Extender collection decomposition. This article will compare the performance of this new high-speed decomposition scenario with IBM purexml™ support in IBM DB2 9.
The new pureXML technology in DB2 9 is very different from CLOB or exploded XML storage. It does not store the document as plain text, nor does it map the XML to a relational or object relational table. Instead, it stores XML using its intrinsic hierarchical format, which matches the XML data model. Each XML document is a well-defined element and attribute tree, and a tree traversal is used to represent an XML query. Therefore, it is natural that the corresponding hierarchical storage and processing formats will make XML data management more efficient. To explain this point in detail, this article compares the pureXML in DB2 9 with the performance of CLOB and decomposition-based XML processing.
Test settings
Table 1 summarizes the comparisons made in this article. In this paper, the key XML operations and corresponding pureXML operations of CLOB and decomposed storage are compared.
Table 1: Comparing CLOB and decomposing XML processing with pureXML
XML in the CLOB |
DB2 9 PureXML |
inserting XML into XML Extender CLOB columns |
Inserting XML into an XML column |
Complete document retrieval of CLOB |
Complete document retrieval of XML columns |
Querying XML in CLOB using the XML Extender extract feature (parsing XML at query time) |
XQuery Operations on XML columns |
|
|
XML that is decomposed into relational tables |
DB2 9 PureXML |
Decompose XML into relational tables using the new decomposition feature of DB2 9 |
Inserting XML into an XML column |
Publish Sql/xml, build XML documents from relational data (for example, XML that was previously decomposed) |
XQuery Operations on XML columns |
All tests are performed using the following data and settings:
A 4-cpu pseries system with ibm®aix®5.2 (64-bit) and single DB2 9 instances installed
1,000 to 100,000 CUSTACC documents (4KB to 20kb size) from the financial scene in the article "DB2 9 XML Performance Features"
Database Managed (DMS) tablespace with a page size of 32KB
All table spaces are defined using the No File system caching option, unless otherwise declared (for some CLOB storage tests)
All table spaces are distributed on 10 physical disks, with database logs in separate isometric columns
All tests use the same database configuration and tuning to ensure the fairness and effectiveness of performance comparisons