XML in Java: Document model, Part I: Performance

Source: Internet
Author: User
Tags file size memory usage object model advantage

Java developers using in-memory XML documents can choose to use a standard DOM representation or any of several Java-specific models. This flexibility has helped to build Java into an excellent platform for XML work. However, as the number of different models increases, it is more difficult to determine how to compare the functionality, performance, and ease of use of the model.

about using the first article in the "XML in Java" series to study the characteristics and performance of some of the leading XML document models in Java. It includes the results of a set of performance tests. The second article in the series will examine usability issues by comparing sample code for the different models used to implement the same tasks.

Document model

The number of available document models in Java has been increasing. For this article, I've covered the most commonly used models and several options, which illustrate the particularly interesting features that may not be widely understood or used. As the importance of the XML namespace increases, I have included a model that supports only this feature. The following is a list of models with brief introductions and version information.

Only to illustrate the terminology used in this article:

A parser is a program that interprets the structure of an XML text document

A document representation is a data structure that a program uses for memory files

Document model refers to libraries and APIs that support the use of document representations

Some XML applications do not need to use the document model at all. If an application can gather the information it needs through a single traversal of the document, the parser may be used directly. This method may require some additional effort, but its performance is always better than building a document representation in memory.

Dom

The DOM ("Document Object Model") is the official web-consortium standard for representing XML documents in a platform-and language-independent way. It is a good contrast for any Java-specific model. To be worthy of separation from the DOM standard, a Java-specific model should offer superior performance and/or ease of use over the Java DOM.

The DOM definition leverages the interface and inheritance of different components of an XML document. This gives developers the advantage of using a common interface for several different types of components, but adds complexity to the API. Because the DOM is language-independent, the interface does not need to take advantage of public Java components, such as the collections class.

This article covers two DOM implementations: Crimson and Xerces Java. Crimson is an Apache project based on the Sun project X parser. It merges a complete validation parser that contains DTD support. The parser can be accessed through the SAX2 interface, and the DOM implementation can work with other SAX2 parsers. Crimson is the open source that is released under the Apache license. The version used for performance comparisons is the Crimson 1.1.1 (the jar file size is 0.2MB), which contains a SAX2 parser that is built from the DOM of the text file.

Another test of the DOM implementation, that is, Xerces Java is another Apache project. Initially, Xerces is based on the IBM Java parser (often called xml4j). (Xerces Java 2, which is currently in the early beta release, will eventually inherit it.) The current version is sometimes called Xerces Java 1. As with Crimson, the Xerces parser can be accessed through the SAX2 interface and DOM. However, Xerces does not provide any way to use the Xerces DOM with a different SAX2 parser. Xerces Java contains validation support for DTDs and XML schemas (with only minimal restrictions on schema support).

Xerces Java also supports the DOM's deferred node extensions (refer to the deferred Xerces or Xerces def in this article), where the document component is initially expressed in a compressed format and is expanded into a full DOM representation only when used. This approach is intended to allow for faster parsing and lower memory usage, especially for applications that may use only a partial input document. Similar to crimson, Xerces is an open source issued under the Apache license. The version used for performance comparisons is Xerces 1.4.2 (the jar file size is 1.8MB).

Jdom

The goal of JDOM is to become a Java-specific document model that simplifies interaction with XML and is faster than using DOM. Because it is the first Java-specific model, JDOM has been vigorously promoted and promoted. is considering using the Java specification Request JSR-102 to eventually use it as a "java standard extension". Although the actual format is still under development, there are significant changes to the JDOM API for the two beta releases. JDOM has been developed since the beginning of the 2000.

There are two main differences between JDOM and DOM. First, JDOM uses only specific classes instead of interfaces. This simplifies the API in some ways, but it also limits flexibility. Second, the API uses a lot of collections classes, simplifying the use of Java developers who are already familiar with these classes.

The JDOM document declares that its purpose is "to use 20% (or less) energy to solve 80% (or more) java/xml problems" (assuming 20% according to the learning curve). JDOM is of course useful for most java/xml applications, and most developers find APIs much easier to understand than DOM. JDOM also includes a fairly extensive review of program behavior to prevent users from doing anything that is meaningless in XML. However, it still requires you to fully understand XML in order to do something beyond the basics (or even to understand some cases of errors). This may be more meaningful than learning a DOM or JDOM interface.

The JDOM itself does not contain a parser. It typically uses the SAX2 parser to parse and validate the input XML document (although it can also represent the previously constructed DOM as input). It contains converters to output JDOM representations as SAX2 event streams, DOM models, or XML text documents. JDOM is the open source that is released under the Apache license variant. The version used for performance comparisons is the JDOM Beta 0.7 (JAR file size is 0.1MB) with a crimson SAX2 parser for building JDOM representations from a text file.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.