Java XML parsing techniques: StAX, SAX, DOM, dom4j, Jdom__java

Source: Internet
Author: User
Tags java web
Java 6.0 has many aspects of the new features that XML supports. For example, Stax, Java Architecture for Xml-web Services (JAX-WS) 2.0, XML-bound APIs (JAXB) 2.0, XML digital signature APIs, and even sql:2003 ' XML ' data types. In this article we are going to introduce the Stax technology because it will be used more frequently in our development.

Stax is an abbreviation of streaming API for XML, a streaming pull analysis API for XML. We must all be familiar with the technology of parsing (or parsing) XML. Before Java 6.0, there were four kinds:
Dom:document Object Model Sax:simple API for XML jdom:java-based Document object Model Dom4j:document object model for Jav A
As for their analytic principles, performance and pros and cons, I'll make a brief introduction at the end of this article. In this article, we mainly talk about Stax this kind of new analytic way.

First, let's get to the two concepts: push analysis and pull analysis.

There are generally two models for accessing and manipulating XML files in a program: the DOM (Document Object model) and the flow model. Their advantages and disadvantages are as follows:

Reference DOM Benefits: allows editing and updating of XML documents, random access to data in a document, using XPath (XML Path Language, a query language that searches for nodes in an XML document) query.
Dom disadvantage: the need to load the entire document into memory at once, which can cause performance problems for large documents.

Referential flow model benefits: Access to XML files uses the concept of streaming, and at any time only the current node in memory solves the DOM's performance problem.
Flow model Disadvantage: is read-only, and can only forward, can not perform backward navigation operations in the document.

about what is DOM, which is described at the end of the article. Here we simply say a nasty: it is a sequential sequence of bytes that can be understood as a special object that keeps bytes from source to target.

Let's go back to the subject. The flow model iterates one node of an XML document at a time, and is suitable for processing large documents with little memory space. It has two variants-the push model and the pull model.

Reference push model: That's what we often call sax, which is an event-driven model. When it raises an event every time a node is found, we need to write a handler for those events. This approach is cumbersome and inflexible.

Reference pull Model: When traversing a document, it pulls the interesting part out of the reader, does not need to raise events, and allows us to selectively process nodes. This greatly enhances flexibility as well as overall efficiency.

Here, we understand the concept of "push analysis" and "Pull analysis":

The analysis method based on the model of flow model is called push analysis, and the analysis method based on the pull model in the flow model is called pull analysis.

Stax is a kind of parse-pull XML parsing technique. It also supports the generation of XML files, but in this article we only introduce the knowledge of parsing.

From the outset, JAXP (Java API for XML processing) provides two ways to handle Xml:dom and sax. Stax is a new flow-oriented approach that was released in March 2004 and is part of Jaxp 1.4 (contained in Java 6.0). The implementation of Stax uses JWSDP (Java Web Services Development Pack) 1.6 and combines sjsxp (Sun Java System XML streaming Parser, Located in the javax.xml.stream.* package).

JWSDP is a development package that is used to develop Web Services, Web applications, and Java applications (primarily XML processing). The Java APIs it contains are:
Jaxp:java API for XML processing Jaxb:java architecture for XML Binding Jax-rpc:java APIs for xml-based Remote Procedure Ca LLS Jax-ws:java API for XML Web Services saaj:soap with Attachments APIs for Java Jaxr:java APIs for XML registries Web serv ICES Registry

Earlier versions of JWSDP also include:
Java Servlet jsp:javaserver Pages jsf:javaserver Faces

Now, JWSDP has been replaced by GlassFish.

Stax includes two sets of XML-processing APIs that provide varying degrees of abstraction. They are: Pointers based APIs and APIs based on iterators.

Let's take a look at pointers based APIs first. It handles XML as a token (or event) stream, and the application can check the state of the parser, get the information of the last token parsed, and then process the next tag, and so on.

Before we start the API exploration, we first create an XML document called Users.xml for testing, which reads as follows:

XML code    <?xml version= "1.0"  encoding= "UTF-8"?>   <company>       <depart title= "Develop group" >            <user name= "Tom"  age= " gender=" male " >manager</user" >           <user name= "Lily"  age= " " gender= "female"  />       </depart>        <depart title= "Test group" >            <user name= "Frank"  age= " gender=" "Male"  >Team Leader</user>           <user name= "Bob"  age= " gender=" male "  />           <user name= "Kate"  age= "25"  gender= "FeMale " />       </depart>   </company>  

The interface that allows us to use a pointer based API is Javax.xml.stream.XMLStreamReader (unfortunately, you can't instantiate it directly), and to get an instance of it, we need to draw on the Javax.xml.stream.XMLInputFactory class. According to Jaxp's traditional style, this is the Abstraction factory (abstract Factory) pattern used here. If you are familiar with this pattern, you can imagine the approximate framework of the code we are going to write in your mind.

First, get an example of a xmlinputfactory. The method is:

Java code Xmlinputfactory Factory = Xmlinputfactory.newinstance ();

Or:

Java code Xmlinputfactory Factory = Xmlinputfactory.newfactory ();

The two methods are equivalent, both of which create a new instance, and even the type of the instance is exactly the same. Because their internal implementations are:

Java Code {return (xmlinputfactory) factoryfinder.find ("Javax.xml.stream.XMLInputFactory", "com.sun.xml.internal.str Eam.   Xmlinputfactoryimpl "); }

Next we can create the Xmlstreamreader instance. We have such a set of methods to choose from:

Java code
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.