Principles and Performance Comparison of four types of Parser (dom, sax, jdom, and dom4j) in XML

Source: Internet
Author: User

1: DOM

 

DOM is the official W3C standard for XML documents in a way unrelated to the platform and language. DOM is a collection of nodes or information fragments organized by hierarchies. This hierarchy allows developers to search for specific information in the tree. To analyze this structure, you usually need to load the entire document and construct a hierarchy before you can do any work. Because it is based on information layers, DOM is considered to be tree-based or object-based. DOM and tree-based processing in the broad sense have several advantages.

 

First, because the tree is persistent in the memory, you can modify it so that the application can change the data and structure. It can also navigate up and down the tree at any time, rather than one-time processing like SAX. DOM is much easier to use.

 

On the other hand, parsing and loading a very large document may be slow and resource-consuming, so it is better to use other methods to process such data. These event-based models, such as SAX.

 

2: SAX

 

The advantages of this processing are very similar to those of streaming media. The analysis can start immediately, rather than waiting for all data to be processed. In addition, because the application only checks data when reading data, it does not need to store the data in the memory. This is a huge advantage for large documents. In fact, the application does not even have to parse the entire document; it can stop parsing when a condition is met. In general, SAX is much faster than its replacement DOM.

 

3: Select DOM or SAX?

 

For developers who need to write their own code to process XML documents, choosing DOM or the SAX Parsing Model is a very important design decision.

 

DOM uses a tree structure to access XML documents, while SAX uses an event model.

 

The DOM parser converts an XML document into a tree containing its content and can traverse the tree. The advantage of using DOM to parse the model is that programming is easy. Developers only need to call the build instruction and then use navigation APIs to access the desired Tree node to complete the task. You can easily add and modify elements in the tree. However, because the DOM parser needs to process the entire XML file, the performance and memory requirements are high, especially when a large XML file is encountered. Due to its traversal capability, DOM parser is often used in services that require frequent changes in XML documents.

 

The SAX Parser uses an event-based model. It triggers a series of events when parsing XML documents. When a given tag is found, it can activate a callback method, tell the method that the label has been found. The memory requirements of SAX are usually relatively low, because it allows developers to decide the tag to be processed by themselves. Especially when developers only need to process part of the data contained in the document, the extension capability of SAX is better reflected. However, it is difficult to use the SAX Parser to encode data, and it is difficult to access multiple different data in the same document at the same time.

4: JDOM

 

JDOM aims to become a Java-specific document model, which simplifies interaction with XML and is faster than DOM. Since JDOM is the first specific Java model, JDOM has been vigorously promoted and promoted. Considering using the Java specification request JSR-102 to ultimately use it as the Java standard extension ". JDOM development has started since the beginning of 2000.

 

JDOM and DOM are mainly different in two aspects. First, JDOM only uses a specific class instead of an interface. This simplifies APIs in some ways, but also limits flexibility. Second, the API uses a large number of Collections classes to simplify the use of Java developers who are already familiar with these classes.

 

The purpose of the JDOM Document declaration is to "use 20% (or less) effort to solve 80% (or more) Java/XML problems" (assumed as 20% based on the learning curve ). JDOM is certainly useful for most Java/XML applications, and most Developers find that APIs are much easier to understand than DOM. JDOM also includes extensive checks on program behavior to prevent users from doing anything meaningless in XML. However, it still requires you to fully understand XML in order to do more than basic work (or even understand errors in some situations ). This may be more meaningful than learning DOM or JDOM interfaces.

 

JDOM does not contain a parser. It usually uses the SAX2 parser to parse and verify the input XML document (although it can also use the previously constructed DOM Representation as the input ). It contains some converters that output the JDOM representation into the SAX2 event stream, DOM model, or XML text document. JDOM is an open source code released under the Apache license variant.

 

5: DOM4J

 

Although DOM4J represents completely independent development results, it was originally a smart branch of JDOM. It combines many functions beyond the representation of basic XML documents, including integrated XPath support, XML Schema support, and event-based processing for large or streaming documents. It also provides the option to build document representation. It provides parallel access through the DOM4J API and standard DOM interface. It has been under development since the second half of 2000.

 

To support all these functions, DOM4J uses interfaces and abstract basic class methods. DOM4J uses a large number of Collections classes in APIs, but in many cases, it also provides alternative methods to allow better performance or more direct encoding methods. The direct advantage is that although DOM4J pays for more complex APIs, it provides much greater flexibility than JDOM.

 

When adding flexibility, XPath integration, and processing large documents, DOM4J has the same goals as JDOM: ease of use and intuitive operations for Java developers. It is also committed to becoming a more complete solution than JDOM to achieve the goal of essentially handling all Java/XML problems. When this goal is achieved, it places less emphasis on preventing incorrect application behavior than JDOM.

 

DOM4J is a very good Java xml api with excellent performance, powerful functionality and extreme ease of use. It is also an open source software. Now you can see that more and more Java software are using DOM4J to read and write XML. It is particularly worth mentioning that Sun's JAXM is also using DOM4J.

 

6: Summary

 

JDOM and DOM do not perform well in performance tests, and memory overflow occurs when testing 10 M documents. DOM and JDOM are also worth considering in the case of small documents. Although JDOM developers have already stated that they want to focus on performance issues before the official release, from the performance perspective, it is indeed not recommendable. In addition, DOM is still a good choice. DOM implementation is widely used in multiple programming languages. It is also the basis of many other XML-related standards, because it is officially recommended by W3C (relative to a non-standard Java model ), so it may also be required in some types of projects (such as using DOM in javascript ).

 

SAX performs well, depending on its specific parsing method. A sax detects the upcoming XML Stream but does not load it into the memory (of course, some documents are temporarily hidden in the memory when the XML Stream is read ).

 

Undoubtedly, DOM4J is the best. Currently, DOM4J is widely used in many open-source projects. For example, the well-known Hibernate also uses DOM4J to read XML configuration files. If portability is not considered, use DOM4J!

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.