Objective
Storing XML data in a relational database provides reliability, manageability, and other benefits of the RDBMS. However, the performance of your application can be problematic if you use clumsy storage methods such as decomposition and CLOB. Two years after IBM introduced pureXML in DB2 9, the problem has been solved.
When IBM launched DB2 9 o'clock in 2006, its pureXML technology attracted a wide range of concerns. However, people do not immediately accept the idea of a "relational/xml" hybrid DBMS. The benefits of this idea are obvious: by integrating XML engines with off-the-shelf RDBMS, XML data can be easily merged into data operations, and many of the data management features of relational systems, such as security and archiving, can be applied to XML data. But what about the performance of this approach? Considering the past performance of the object-relational database, people are skeptical about the performance of the hybrid system.
Two years later, PureXML's performance impact is not only theoretical, but has been shown in real-world conditions.
What do you want to get from XML?
In the 5 years of development of PureXML and DB2 9, IBM has studied the nature of XML (XML has great flexibility and complexity) and found five areas to improve. IBM focuses on two performance factors: storage model and query optimization.
XML data is represented by a hierarchical tree structure that typically contains multiple tiers and a large number of nodes, so this format is difficult to optimize and index. It also involves some form of compression. As a result, XML queries can become very complex.
A common method of storing XML data is a large character object (CLOB). Like a binary large object (BLOB), CLOB is typically stored as a whole and rarely preprocessed for indexing or query optimization. Another approach is to "decompose" XML data into columns of relational data, which requires complex parsing techniques, often using a large amount of storage space.
DB2 PureXML stores XML data as a predefined, hierarchical format that reflects the underlying nesting structure of the data. This format supports the preparation of complex indexes and allows the compression of data in physical storage. When executing queries, PureXML transforms XQuery and Sql/xml queries into a unified format that is optimized in many ways (cross-language, query Rewrite, optimized indexing, and cost). XML compression, query optimization and a mix of relational and XML processing (improved insert and update performance in DB2 9.5) Improve database processing performance; IBM's Tests and real-world projects show a 10 to 20 times times faster than similar operations in DB2 V8 or other DBMS (see The "Joy of Success" section of relevant content.
Models that are more appropriate for XML
The performance characteristics of PureXML also include its adaptability. Important: XML data does not replace relational data. XML data is difficult to transcend relational databases for standard financial operations, but an RDBMS can be difficult to work with publications, such as the entire book, magazine, or journal. An RDBMS is also difficult to handle if the data in your application has a complex hierarchy or contains a large amount of unstructured information. For example, the life sciences organization is transforming many of the common data into XML from a variety of proprietary formats. In some cases, XML formats provide an optimized data access path for data that is difficult to store and extract in a traditional RDBMS, making the XML database fundamentally a significant performance advantage.
XML DBMS also has the advantage that they can communicate directly with XML applications through WEB services or other methods. Because XML is widely used on the Internet and communication between applications is heavily in XML format, it makes sense to have the entire communication chain (application-message-database) in XML format.
IBM's PureXML has many important innovations in technology (68 new patents!). , but the real value of the IBM relational/xml model is its performance in mainstream enterprise applications.
The performance of XML in real-world environments
Beijing Xicheng District Health Bureau, which provides services to about 1 million people, stores the customer list in a large database containing sensitive data. Health systems need to combine complete documentation (such as a physician's report) with data from many data sources, so it is very important to take advantage of the flexibility of XML. "DB2 9 with PureXML features is not only suitable for core database storage models, but also for raw data collection forms and data exchange," Shu Zhu, a CIO at the health bureau, said. Mr. Zhu believes that XML can handle complex personal health records, which makes XML very suitable for them; XML provides a flexible query capability to quickly respond to real-time on-demand information, which is important for certain medical activities. In particular, Xicheng Health Bureau combines its large DB2 9 database with the IBM service-oriented Architecture (SOA) to implement a "service bus" for WEB services that provide data to applications.
German Research Center for competitive Sports (Das Deutsche Forschungszentrum für Leistungssport) has developed an application called EACTE, which uses To collect and analyze the basic information and application research information about sports science. Data collection is the most difficult aspect of the program. This database contains a wide range of content, including monitoring device generated large amounts of data, scanned images and a large number of manually entered data. For example, there are 3,000 parameters that are collected in 9 forms, and there are 63 pages. Data is captured through a dedicated online portal or client application using the Lotus Forms software and then passed through an IBM WebSphere application Server and stored in relational format and XML format in IBM DB2 9. For research centers located in Koln, PureXML is able to quickly capture complex motion test results and submit data in XML format, and other programs can easily be analyzed in this format.
Another German organization, Douglas Holding AG, uses IBM DB2 9 and PureXML to collect data from 1,600 retail stores and 800 spice stores to perform a very traditional retail job: tidy up cash receipts for each store every day and prepare results for the company's Data warehouse. In this case, PureXML is used to compress data and determine the structure of the data, Douglas finds it easier to use than the previous system (see the "Joy of Success" section of the relevant content).
These examples illustrate one point: the efficiency of PureXML data storage, the speed of response, and the flexibility of querying XML data are good.
Best practices
IBM has done a lot to improve XML performance, especially in response to the increase in data volume and query complexity, but like all data management systems, PureXML needs good settings and some tweaks. Some suggestions for tuning XML performance See the "Getting the best XML Query Performance" section in the relevant content.
The XML database market has become more competitive in the last 10 years, but IBM's hybrid approach is very advanced, making XML an integral part of day-to-day data management. To achieve this goal, IBM has enhanced the functionality, reliability, and performance of PureXML. IBM did some research first, and then implemented a number of performance improvements in DB2 9.5.
Although XML has become the primary data language on the Internet, it is the de facto data exchange standard and is almost certainly part of WEB services and SOA, but some DBAs (and even the entire IT department) are still hesitant about accepting it. DB2 9 and PureXML introduce the reliability, scalability, and manageability of relational databases into the XML realm while avoiding the drawbacks of other legacy storage methods. As a result, IT personnel who are skeptical about the performance of XML should change their minds.