XML Parsing Method

Source: Internet
Author: User
Tags xslt xslt processor
What are sax, Dom, JAXP, JDOM, and dom4j?

Sax and Dom are two methods for analyzing XML documents (no specific implementation, only interfaces). Therefore, they are not interpreters, you cannot process XML documents. The sax package is org. xml. Sax. The Dom package is org. W3C. Dom. The package name is very important and helps you understand the relationship between them.

JAXP is an API that encapsulates the interfaces of sax \ Dom. Based on the sax \ Dom, a simple API is provided for developers. The JAXP package is javax. XML. parsers can look at the source file of JAXP. Its file contains a reference to the sax or DOM (import) JAXP is not a specific implementation. It is just a set of APIs. If you only have JAXP, it will not work. (In fact, JAXP only packs the sax and Dom and generates documentbuilderfactory \ documentbuilder and saxparserfactorysaxparser. That is, the factory mode in the design mode. Its advantage is that the specific object (Interpreter) is created by the subclass)

JDOM, dom4j, and W3C standard DOM APIs are difficult to use. Therefore, a group of developers develop Java-specific xmlapis for ease of use. This is the origin of JDOM, when the development was half done, the other part of the people split up and they had their own ideas, so they went to develop dom4j and formed the two APIs today. As for the performance between them, JDOM was defeated in all respects, and dom4j was defeated in all respects. I think JDOM and dom4j are equivalent to sax/DOM + JAXP. You can choose a specific interpreter.

Technical Features of sax, Dom, JAXP, JDOM, and dom4j: Dom

Dom (Document Object Model) is an official W3C standard API that accesses and modifies the content and structure of a document in a language-independent way. Dom is designed based on the specification of Object Management Organization (OMG). Therefore, it can be used by any Programming Language Analyzer to store the entire XML document in the memory as a tree, applications can access and operate any part of the DOM tree at any time. The DOM tree allows random access by applications. This access method provides great flexibility for Program Development and allows you to control the content of the entire XML document at will. However, Dom has a high requirement on memory and is not very efficient.

Advantages:

A rich set of APIS for easy navigation.

The entire DOM tree is loaded into the memory, allowing random access.

Disadvantages:

The entire XML document must be parsed once.

The entire DOM tree is loaded into the memory, with high memory requirements.

Generally, DOM nodes are not ideal for binding object types that must be created for all nodes.

Sax

SAX (Simple API for XML, simple application interface of XML) is a "push" model used to process XML event drivers. Although it is not W3C standard, however, it is a widely recognized API. Unlike Dom, The SAX Parser creates a complete document tree. Instead, it activates a series of events when reading documents, which are pushed to the event processor, the event processor then provides access to the document content. Because event triggering is time-ordered, the sax analyzer provides a sequential access mechanism for XML documents. For the analyzed parts, you cannot repeat them back for processing. During implementation, the sax analyzer only checks the byte stream in the XML document in sequence to determine which part of the XML syntax is the current byte, and checks whether it complies with the XML syntax and triggers corresponding events. The event processing function itself must be implemented by the application itself. The working principle of Sax is simply to scan documents in sequence. When a document is scanned to start and end, elements start and end, and documents) the event handler function is notified when the event ends. The event handler function performs the corresponding action and continues the same scan until the document ends.

Advantages:

Compared with Dom, The SAX Parser provides better performance advantages and provides effective low-level access to XML document content.

The biggest advantage of the sax model is the low memory consumption, because the entire document does not need to be loaded into the memory at a time, which allows the SAX Parser to parse documents larger than the system memory.

You do not need to create objects for all nodes as in the Dom.

The sax "push" model can be used in a broadcast environment. It can register multiple contenthandler at the same time and receive events in parallel, instead of processing events one by one in one pipeline.

Disadvantages:

No built-in documentation navigation support.

XML documents cannot be randomly accessed.

XML modification in situ is not supported.

Namespace scopes are not supported.

JAXP

JAXP (Java API for XML processing, Java programming interface for XML document processing) is an API developed by Sun to operate XML based on the above two sets of APIs. Supports Dom, sax, XSLT, and other standards. To enhance the flexibility of JAXP, developers have designed a pluggabilitylayer for JAXP. With the support of pluggabilitylayer, JAXP can be used with Various XML Parser (xmlparser, for example, Apache xerces can work together with the XSLT processor (such as apachexalan) that executes the XSLT standard. Due to too many problems, you can delete this function from jdk1.7.

JDOM

JDOM is a pure Java API for processing XML. Use a specific class instead of an interface. JDOM supports tree traversal and Java rules of sax.

JDOM aims to become a Java-specific document model, which simplifies interaction with XML and is faster than Dom. Is the first Java-specific model, and JDOM has been vigorously promoted and promoted. JDOM and Dom are mainly different in two aspects. First, JDOM only uses a specific class instead of an interface. This simplifies APIs in some ways, but also limits flexibility. Second, the API uses a large number of collections classes to simplify the use of Java developers who are already familiar with these classes. JDOM is certainly useful for most Java/XML applications, and most Developers find that APIs are much easier to understand than Dom. JDOM also includes extensive checks on program behavior to prevent users from doing anything meaningless in XML. However, it still requires you to fully understand XML in order to do more than basic work (or even understand errors in some situations ). This may be more meaningful than learning Dom or JDOM interfaces. JDOM does not contain a parser. It usually uses the sax2 parser to parse and verify the input XML document (although it can also use the previously constructed DOM Representation as the input ). It contains some converters that output the JDOM representation into the sax2 event stream, Dom model, or XML text document. JDOM is an open source code released under the Apache license variant.

Advantages:

It is a tree-based Java API that processes XML and loads the tree into the memory.

There is no downward compatible restriction, so it is simpler than Dom.

Fast.

Java rules with Sax.

Disadvantages:

Documents larger than memory cannot be processed.

JDOM indicates the logic model of the XML document, and cannot guarantee the true conversion of each byte.

No actual models of DTD and mode are provided for instance documents.

The corresponding traversal package in the Dom is not supported.

Dom4j

Dom4j is an intelligent branch of JDOM. It combines many functions beyond the representation of basic XML documents, including integrated XPath support, XML Schema support, and event-based processing for large or streaming documents. It also provides the option to build document representation. It provides parallel access through the dom4j API and standard DOM interface. All these functions are supported. dom4j uses interfaces and abstract basic class methods. Dom4j uses a large number of collections classes in APIs, but in many cases, it also provides alternative methods to allow better performance or more direct encoding methods. The direct advantage is that although dom4j pays for more complex APIs, it provides much greater flexibility than JDOM. When adding flexibility, XPath integration, and processing large documents, dom4j has the same goals as JDOM: ease of use and intuitive operations for Java developers. It is also committed to becoming a more complete solution than JDOM to achieve the goal of essentially handling all Java/XML problems. When this goal is achieved, it places less emphasis on preventing incorrect application behavior than JDOM. Dom4j is a very good Java xml api with excellent performance, powerful functionality and extreme ease of use. It is also an open source software. Now you can see that more and more Java software are using dom4j to read and write XML. It is particularly worth mentioning that Sun's jaxm is also using dom4j.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.