Two years after IBM introduced pureXML in DB2 9, this problem has been completely solved. When IBM launched the DB2 9 database in 2006, its pureXML technology attracted wide attention. However, people do not immediately accept the idea of "relational/XML" hybrid DBMS.
The benefits of this idea are obvious: by integrating the XML Engine with the ready-made RDBMS, XML data can be easily merged into data operations, you can also apply many data management features of relational systems for XML data, such as security and archiving. However, what is the performance of this approach? Considering the past performance of object-relational databases, people are skeptical about the performance of hybrid systems.
Two years later, the performance of pureXML was not only theoretically influenced, but also presented in the real environment.
What do you want to get through XML?
During the five years of development of pureXML and DB2 9, IBM has thoroughly studied the nature of XML (XML is flexible and cumbersome) and found five aspects to be improved. IBM focuses on two performance factors: storage model and query optimization.
XML data is represented by a hierarchical tree structure that usually contains multiple layers and a large number of nodes. Therefore, this format is difficult to optimize and compile indexes. It also involves some form of compression. Therefore, XML queries may become very complex.
A common method for storing XML data is the character large object (CLOB ). Like a Binary Large Object (BLOB), CLOB is usually used as a whole for storage and is rarely pre-processed for indexing or query optimization. Another method is to "break down" XML data into columns of relational data. This method requires complex parsing techniques and usually uses a large amount of storage space.
DB2 pureXML stores XML data in a pre-Resolved hierarchical format that reflects the underlying nested structure of the data. This format supports compiling complex indexes and compressing data in physical storage. During query execution, pureXML converts XQuery and SQL/XML queries into a unified format optimized in many aspects (cross-language, query rewriting, optimized indexing, and cost. XML compression, query optimization, and the combination of relational and XML Processing (the performance of insert and update is also improved in DB2 9.5) improve the performance of the database in processing DB2 9 XML; IBM tests and real projects show that similar operations in DB2 V8 or other DBMS are 10 to 20 times faster (see the "Joy of success" section in relevant content ).
More suitable for XML models
The performance characteristics of pureXML also include adaptability. Key Point: XML data does not replace relational data. For standard financial businesses, XML data is difficult to surpass relational databases, but RDBMS is difficult to process publications, such as the whole book, magazine, or journal. RDBMS is hard to process if the data in an application has a complex hierarchy or contains a large amount of unstructured information. For example, life sciences organizations are converting many common data from various proprietary formats to XML. In some cases, for data that is difficult to store and extract in traditional RDBMS, the XML format provides an optimized data access path, which makes the XML database fundamentally superior in performance.
Xml dbms has another advantage: it can communicate directly with XML applications through Web services or other methods. Because XML is widely used on the Internet, the communication between applications is also widely used in XML format, so the entire communication chain (Application-message-database) it makes sense to use the XML format.
IBM pureXML has many significant technological innovations (68 new patents !), However, the true value of the IBM relationship/XML model lies in the performance it shows in mainstream enterprise applications.
Performance of XML in real Environments
Beijing Xicheng District Health Bureau provides services for about 1 million people. It stores the customer list in a large database, including sensitive data. The health system needs to combine complete documents (such as doctors' reports) with data from many data sources, so XML flexibility is very necessary. Zhu Shu Hong, CIO of the Health Bureau, pointed out that "DB2 9 with the pureXML feature is not only suitable for the core database storage model, but also for the form of raw data collection and data exchange ". Mr. Zhu believes that XML can process complex personal health records, which makes XML very suitable for them. XML provides flexible query functions to quickly respond to real-time on-demand information, this is important for some medical activities. In particular, Xicheng District Health Bureau combines its large DB2 9 database with IBM's Service-Oriented Architecture (SOA, A "Service Bus" is implemented for Web services that provide data to applications ".
German Research Center for Competitive Sports (Das Deutsche Forschungszentrum für Leistungssport) developed an application called eActe, this program is used to collect and analyze basic information about sports science and application research information. Data collection is the most difficult aspect of this program. This database contains a wide range of content, including a large amount of data generated by monitoring devices, scanned images, and a large amount of manual input data. For example, there are 3,000 parameters collected in 9 forms, a total of 63 pages. Data is captured through a dedicated online portal or client Application using Lotus Forms Software and then transmitted through an IBM WebSphere Application Server, and stored in the relational format and XML format in IBM DB2 9. For a research center in Koln, pureXML can quickly capture complex motion test results and submit data in XML format, which is easy for other programs to analyze.
Another German organization, Douglas Holding AG, uses IBM DB2 9 and pureXML to collect data from 1,600 retail stores and 800 spice stores for a very traditional retail job: organize cash receipts for each store every day and prepare results for the company's data warehouse. In this case, pureXML is used to compress data and determine the data structure. Douglas finds that this system is easier to use than the previous one (see the "Joy of success" section in the relevant content ).
These examples show that the efficiency, response speed, and query flexibility of the pureXML data storage are good.
Best practices
IBM has done a lot of work to improve XML performance, especially to cope with the increase in data volume and query complexity. However, like all data management systems, pureXML also requires good settings and some adjustments. For some suggestions on XML performance tuning, see the section "getting excellent XML query DB2 9 XML performance" in the relevant content.
The XML database market has been increasingly competitive over the past 10 years, but the IBM hybrid approach is very advanced, which makes XML an inherent part of daily data management. To achieve this goal, IBM has enhanced the features, reliability, and performance of pureXML. IBM first made some research and then made many performance improvements in DB2 9.5.
Although XML has become the main data language on the Internet, IT is actually a standard for data exchange, and has almost become a part of Web Services and SOA, but some DBAs (and even the entire IT department) I am still hesitant to accept it. DB2 9 and pureXML introduce the reliability, scalability, and manageability of relational databases into the XML field, while avoiding the disadvantages of other older storage methods. Therefore, IT staff who are skeptical about XML performance should change their mindset.
Related content
Excellent XML query performance
XML applications usually require high performance. The primary responsibility for achieving good performance is the responsibility of Application designers and DBAs. In Matthias Nicla's developerWorks article "DB2 9's 15 best practices for pureXML performance", he explains how to use DB2 9's pureXML technology to produce the best performance. The following summarizes the first five tips.
Rationally select the granularity of the XML document. When designing XML applications and XML document structures, you must determine which business data is included in a single XML document. When selecting between fine granularity, medium granularity, and coarse granularity, you need to consider which granularity the application mainly uses to access and process data. If you are not sure, You should prefer to use smaller granularity and smaller XML documents.
Use DMS and larger pages to achieve better XML performance. Database-managed tablespaces (DMS) provide higher performance than operating system-managed tablespaces (SMS. In DB2 9, the new tablespace is DMS by default. The use of DMS tablespaces and Automatic Storage enables DMS containers to grow as needed without human intervention. The empirical rule is to select a page size that is no less than twice the average expected document size for XML data and does not exceed the maximum 32 KB. If you use a single page size for relational data and XML data (or for data and indexing), the 16 kb or 8 KB page may be a good choice for both data types.
Use XML storage options: inline, compressed, or individual tablespace. If you find that the XML data requires a larger page size, and the relational data or index requires a smaller page size, you can use a separate tablespace. When defining a table, you can convert the "long" data (big objects and XML data) to a separate tablespace with different page sizes. However, we should consider it according to common sense. With fewer buffer pools, tablespaces, and fewer page sizes, you can create simpler physical database designs for easier management, maintenance, and tuning. Therefore, avoid introducing multiple page sizes unless you know that doing so can indeed bring performance benefits. The internal and compression functions should be used to reduce storage overhead and improve I/O performance.
Configure DB2 to quickly insert XML data into blocks. Traditional performance tuning of insertion and logging is still applicable to XML insertion and import. In DB2 9.5, load is used instead of import. If the table requires multiple XML indexes, you should define them before any block insertion. In this way, each XML document can be processed only once to generate index entries for all XML indexes. If multiple create index statements are issued after the table is filled, all documents in the XML column will be accessed multiple times.
Use the new snapshot monitor element to check XML performance. DB2 9 provides new buffer pool snapshot monitor elements for XML data that correspond to existing counters used for relational data and indexing. Since relational data and indexes are stored in different storage objects in a tablespace, they have different read and write counters. PureXML storage in DB2 9 introduces a new storage object (called XDA) for XML data, which also has its own buffer pool counters.
After studying the performance tips of Nicola, be sure to read his article on improving XML query performance through indexes.
Joy of success
DB2 9 pureXML enables spice stores and retail company Douglas Holding to organize and aggregate sales data faster.
Douglas Holding AG owns more than 1,800 retail stores in Europe and the United States. In 2005, it launched a project called Operational Data Stage to solve the main problems of cash receipt sorting and sales information aggregation. Obtaining data in XML format is an important basic step. "Flexibility is the main reason for choosing XML," said Dr. Andreas birkenddorf, project leader at Douglas Holding, a branch of the company's IT needs, "because the data sources are very different and their formats may change, we need a very flexible interface."
This project uses IBM DB2 Version 8 XML Extender at the beginning, which is effective but complicated and difficult to use. In 2007, the emergence of IBM DB2 9 pureXML changed this situation. DB2 9 is easier to use, especially for ing between XML and relational structures. Compared with the DB2 V8 XML Extender, the performance is improved by more than 20 times. This greatly reduces processing time; while birkenddorf's team once worried that they would not be able to afford it if Douglas decided to add hundreds of stores, birkenddorf is confident that "now we know we can handle it ."
According to birkenddorf, the performance improvement of some DB2 9 XML files does not come from physical processing speed improvement, but because it does not need to complete some work. "If the format changes slightly, pureXML makes it easier for us to deal with it. In fact, sometimes we don't even need to make manual changes; the system will handle it ."
Douglas Holding has long used pureXML to implement the XML model system, but this process is not smooth sailing. "People with a relational database background may need to adjust their mindset," birkenddorf said. "We realized the value of XML before the emergence of pureXML. We know that it is exactly what we need."