Four ways to parse XML files in Java

Source: Internet
Author: User
Tags xpath xslt

Summary
Extensible Flag Language (XML) has its unique technical superiority in realizing information standardization, exchanging and sharing information, so it has received a wide attention. This paper briefly introduces the basic knowledge of XML, then from the XML application to summarize the four most common XML parsing methods, introduced the characteristics of these four methods, including the advantages and disadvantages. At last, a simple case is given to introduce the four kinds of parsing code.

"keywords"
XML file, dom,sax,jdom,dom4j

Introduction
XML Extensible Markup Language (extensible Markup Language), a subset of standard generic markup languages, is a markup language used to tag electronic files to make them structural. A tag is a symbol of information that a computer can understand, that is, a computer can handle a variety of information, such as articles, among other things. How to define these tags, you can choose the international common markup language, such as HTML, you can also use a markup language like XML, which is the extensibility of language. XML is simplified and modified from SGML, it mainly uses XML, XSL, XPath and so on.

Simply put, XML is a descriptive language of data, although it is a language, but usually it does not have the basic function of common language-it is recognized and run by the computer. Just rely on another language to explain it, to get it to the effect you want or to be accepted by the computer. As we all know, there are more and more ways to parse XML now, but there are four main methods, Dom, SAX, Jdom and dom4j.

First, the application of XML
XML application surface can be divided into two kinds, one is document type XML, the other is data type XML.
Here are a few common XML applications:
1, custom xml+xslt=>html, this can be said to be one of the most common document type application. The XML for this document type holds the XML data for the entire document, and then the XSLT transforms and parses the XML into HTML tags in the XSLT, eventually becoming HTML, so it can be displayed on the browser.

That is, in the process of conversion, XSLT uses XPath to define one or more matching portions of the source document. When a match is found, the XSLT transforms the matching portion of the source file into a result document, and the portion of the template match is eventually kept intact in the result.

2, XML as a micro-database, this is one of the most common data type applications. We use the relevant XML APIs (MSXML DOM, Java DOM, etc.) to access and query XML XML. It is worth noting that "micro-database", which means that in a small amount of data, fewer users and low performance requirements of the environment, the XML document can be used as a database, but not for large user volume, high data integration and high performance requirements of the job environment. An example of XML that is appropriate for use as a "database" is an. ini file-It contains configuration information for the application.

3, as the communication data. The most typical is Web service, which uses XML to pass data. It can be simply described as creating an XML file, adding the information node to be stored in the XML file, and then uploading the XML to the page to receive the data, and then parsing the XML file after the page gets the XML file, preferably displaying the node information in the XML file on the page.

4, as the configuration information data for some applications. Common Web. XML, such as Java EE, is used to configure the server.

5. xml format for some other documents. such as Word, Excel, and so on.

6. Save the mapping relationship between data. such as Hibernate.

The above 6 kinds of XML applications, basically covers the main use of XML.

Here are four ways to parse XML files in Java

Second, the method of parsing XML file
1. DOM (Document Object Model)

DOM is the official standard for representing XML documents in a platform-and language-neutral way. The DOM is a collection of nodes or pieces of information that are organized in a hierarchical structure. This hierarchy allows developers to look for specific information in the tree. Analyzing the structure usually requires loading the entire document and constructing the hierarchy before any work can be done. Because it is based on the level of information. Thus the DOM is considered to be tree-based or object-based. Dom and the generalized tree-based processing have several advantages. First, because the tree is persistent in memory, you can modify it so that the application can make changes to the data and structure. It can also navigate up and down the tree at any time, rather than a one-off process like sax. Dom is much simpler to use.
Advantages
1) Form a tree structure, intuitive and easy to understand, code easier to write.
2) The tree structure is stored in memory during parsing, so it is easy to modify
Disadvantages
1) When the XML file is large, the memory consumption is very large, it is easy to affect the parsing performance and may cause memory overflow.

2. SAX (simple API for XML)

The benefits of Sax processing are very similar to the advantages of streaming media, where analysis can begin immediately rather than waiting for all data to be processed. Also, because the application examines data only when it is being read, it does not need to store the data in memory. This is a huge advantage for large documents. In fact, the application doesn't even have to parse the entire document; it can stop parsing when a condition is met. In general, Sax is much faster than its surrogate dom.

Select Dom or choose Sax? For developers who need to write their own code to handle XML documents, choosing the DOM or Sax parsing model is a very important design decision, and Dom uses a tree-structured approach to accessing XML documents, and the event model that Sax uses.
The DOM parser transforms an XML document into a tree containing its contents and can traverse the tree. The advantage of parsing a model with DOM is that programming is easy, and developers simply need to invoke the build instructions and then use the navigation APIs to access the desired tree nodes to complete the task. It is easy to add and modify elements in the tree. However, because of the need to process the entire XML document when using the DOM parser, the performance and memory requirements are high, especially when encountering large XML files. Because of its traversal capabilities, DOM parsers are often used in services where XML documents require frequent changes.

The SAX parser uses an event-based model that can trigger a sequence of events when parsing an XML document, and when a given tag is found, it can activate a callback method that tells the method that the label has been found. Sax requirements for memory are usually low because it allows the developer to decide for themselves which tag to ask for processing. SAX is a much better extension of this ability, especially when developers only need to work with some of the data contained in the document. But it is difficult to code with a SAX parser, and it is difficult to access multiple different data in the same document at the same time.
Advantages
1) without waiting for all data to be processed, the analysis can begin immediately.
2) Check the data only when reading data, do not need to save in memory.
3) You can stop parsing when a condition is met, without parsing the entire document.
4) High efficiency and performance, can solve the document larger than the system memory.
Disadvantages
1) requires the application to be responsible for the processing logic of the tag (such as maintaining parent-child relationships, etc.), the more complex the document, the more complex the program is difficult to encode.
2) One-way navigation, it is difficult to access multiple different data in the same XML at the same time.

3. JDOM (java-based Document Object Model)

The purpose of Jdom is to become a Java-specific document model that simplifies interacting with XML and is faster than using DOM implementations. As a result of the first Java-specific model, JDOM has been vigorously promoted and promoted. is considering using Java Spec request JSR-102 to eventually use it as a "Java markup extension."

There are two main differences between Jdom and Dom. First, Jdom uses only specific classes rather than interfaces. This simplifies the API in some ways, but it also limits flexibility. Second, the API uses the collections class extensively, simplifying the use of Java developers who are already familiar with these classes.
The purpose of the Jdom document declaration is to use 20% (or less) effort to resolve 80% (or more) java/xml issues. Jdom is certainly useful for most java/xml applications, and most developers find the API much easier to understand than the DOM. Jdom also includes fairly extensive checks of program behavior to prevent users from doing anything meaningless in XML. However, it still needs to fully understand the XML in order to do something that goes beyond basic work.

The jdom itself does not contain parsers. It usually uses the SAX2 parser to parse and validate the input XML document (although it can also use the previously constructed DOM representation as input). It contains converters to output jdom representations to SAX2 event streams, DOM models, or XML text documents. Jdom is an open source published under the Apache license variant.
Advantages
1) simplifies the DOM API by using specific classes rather than interfaces.
2) A large number of Java collection (collections) classes are used to facilitate Java developers.

Disadvantages
1) No good flexibility.
2) Poor performance.

4. DOM4J (Document Object Model for Java)

Although DOM4J represents a completely independent development result, initially it is an intelligent branch of Jdom. It incorporates a number of features beyond the basic XML document representation. Includes integrated XPath support, XML schema support, and event-based processing for large documents or streaming documents. It provides the option to build a document representation, which has parallel access through the DOM4J API and the standard DOM interface.

To support all of these features, DOM4J uses interfaces and abstract basic class methods. An intelligent branch of Jdom. It incorporates a number of features beyond the basic XML document representation, including integrated XPath support, XML schema support, and event-based processing for large documents or streaming documents. It also provides the option to build the document, which has parallel access through the DOM4J API and the standard DOM interface.

To support all of these features, DOM4J uses interfaces and abstract basic class methods. DOM4J uses the collections class in the API extensively, but in many cases it also provides alternatives to allow better performance or a more straightforward coding method. The direct benefit is that while dom4j is paying the cost of a more complex API, it offers much greater flexibility than jdom.
When adding flexibility, XPath integration, and the goal of large document processing, DOM4J's goal is the same as the Jdom: ease of use and intuitive operation for Java developers. It is also committed to becoming a more complete solution than jdom, achieving the goal of essentially addressing all java/xml issues. When this goal is completed, it is less stressed than jdom to prevent incorrect application behavior.
DOM4J is a very good Java XML API with excellent performance, powerful features, and extreme ease of use, as well as an open source software. Now you can see that more and more Java software is using dom4j to read and write XML, especially the Sun Jaxm is also using DOM4J.
Advantages
1) A large number of Java collection classes are used to facilitate Java developers, while providing some alternative ways to provide performance.
2) Excellent performance, flexibility, powerful and easy to use features.
Disadvantages
1) Extensive use of the interface, the API is more complex.

Comparison of three or four analytic methods
1, dom4j performance is the best, Sun's JAXM is also in use DOM4J, many open-source projects in a large number of dom4j, such as Hibernate is also using DOM4J to read the XML configuration file. If portability is not considered, then dom4j is used.

2. Jdom and Dom perform poorly during performance testing, and memory overflows when testing 10M documents. In the case of small documents it is also worth considering Dom and Jdom. Although Jdom developers have stated that they expect to focus on performance issues before the full release, there is no merit in the performance perspective. In addition, Dom is still a very good choice. DOM implementations are widely used in many programming languages. It is also the basis for many other XML-related standards, because it is formally recommended (as opposed to a non-standard Java model), so it may also be needed in some types of projects, such as using the DOM in JavaScript.

3, sax performance is good, this depends on its specific parsing mode-event-driven. A sax detects the incoming XML stream, but does not load into memory (of course, some documents are temporarily hidden in memory when the XML stream is read).

4, if the XML document is large and does not consider the portability problem can be used dom4j, if the XML document is small, you can use Jdom, if you need to deal with it in a timely manner without saving data, you can consider sax.

Four or four examples of parsing
1. Parsing XML using DOM method
2. Parsing XML using Sax method
3. Parsing XML using Jdom method
4. Parsing XML Using dom4j method

Four ways to parse XML files in Java

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.