Parsing XML data based on Stax in JDK 6.0

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

xml| data

J2ee/xml developers typically use the Document Object Model (DOM) API or a simple API for XML (SAX) API to parse XML documents. However, these APIs have their drawbacks. One of the drawbacks of the DOM API is that it consumes a lot of memory, because before the XML document can be navigated, a complete memory structure of the XML document must be created. The downside of the Sax API is that it instances a push Analysis model API, in which parsing events are generated by the parser. In comparison, Stax is based on a pull analysis model. In this article, you will first create your own XML document and then learn to analyze it using a variety of different methods, and finally, we use the event-generated Stax pull method.

　　 The analysis of push and pull analysis

Compared to push analysis, pull analysis has some advantages as follows:

1. In pull analysis, events are generated by the profiling application, so the analysis rules are provided to the client rather than the parser.

2. The code for the pull analysis is simpler and it has fewer libraries than the push analysis.

3. Pull analysis Client can read multiple XML documents at the same time.

4. Pull analysis allows you to filter the XML document and skip profiling events.

　　 Second, understand Stax

The streaming API for XML (StAX) was introduced in the JSR 173 specification in March 2004, a streaming pull-analysis API for XML. Stax is a new feature provided by JDK 6.0 that you canhereDownload its beta version for trial.

A push-model parser keeps generating events until the XML document is completely parsed. However, the pull analysis is adjusted by the application, so the profiling event is generated by the application. This means that, with Stax, you can postpone profiling-skipping elements during analysis and parsing multiple documents. When using the DOM API, you have to parse the entire XML document into a DOM structure, which reduces the efficiency of the analysis. With Stax, profiling events are generated when parsing an XML document. The comparison between the Stax Analyzer and the other analyzers is not much discussed here.

The implementation of the StAX API is the use of Java Web Service Development (JWSDP) 1.6 and a combination of the Sun Java Streaming XML Analyzer (SJSXP), which is located in the Javax.xml.stream package. The Xmlstreamreader interface is used to parse an XML document, and the Xmlstreamwriter interface is used to generate an XML document. Xmleventreader is responsible for parsing XML events using an object event iteration-This contrasts with the cursor mechanism used by xmlstreamreader. This tutorial completes an analysis of an XML document based on the Stax implementation in JDK 6.0.

In fact, Stax is just one of the new XML features provided by JDK 6.0. The new JDK 6.0 also provides support for the XML-bound Java API (JAXB) 2.0,xml digital Signature API for Xml-web services, and even sql:2003 ' XML ' data types for the Java Schema (JAX-WS) 2.0 for the service.

　　 third, initial installation

If you are using JDK 6.0, the StAX API is in Classpath by default. If you are using JWSDP 1.6, please add the JWSDP 1.6 StAX API to Classpath. This requires adding <jwsdp-1.6>\sjsxp\lib\ Jsr173_api.jar and <jwsdp-1.6>\sjsxp\lib\sjsxp.jar to the CLASSPATH variable. Install JWSDP 1.6 under the <jwsdp-1.6> directory. The Jsr173_api.jar corresponds to the JSR-173 API Jar,sjsxp.jar corresponding to the SJXSP implementation JAR.

　　 Iv. use Xmlstreamwriter for write operations

First, you create the XML document that will be parsed. XML is generated by Stax's Xmlstreamwriter. However, one limitation of xmlstreamwriter is that it does not necessarily generate well-formed documents-and that the resulting document is not necessarily valid. You need to make sure that the resulting XML document is well-formed. Listing 1 is an example of an original XML document generated by Xmlstreamwriter.

Here, you try to use the Xmlstreamwriter API to generate the Catalog.xml in Listing 1. The Code snippets in this section are excerpted from the Xmlwriter.java application, which is displayed in Listing 2. First, you will import the Stax package class, please refer to the following code:

Import javax.xml.stream.*;
Import javax.xml.stream.events.*;
Import Javax.xml.stream.XMLOutputFactory;
You're going to get your xmlstreamwriter from a xmloutputfactory. So, first you have to create a new xmloutputfactory:

Xmloutputfactory outputfactory=xmloutputfactory.newinstance ();
Next, create a filewriter to output the XML document-it will be generated into an XML file:

FileWriter output=new FileWriter (New File ("C:/stax/catalog.xml"));
Next, create a xmlstreamwriter:

Xmlstreamwriter xmlstreamwriterr=outputfactory.createxmlstreamwriter (output);
Now, use the WriteStartDocument () method to create the beginning of a document. Add the encoding and version to be specified in the XML declaration (remember that the specified encoding is not the encoding of the generated XML document). What if you need to specify the encoding of an XML document? When you create a Xmlstreamwriter object from a Xmloutputfactory object, you do this:

Xmlstreamwriter.writestartdocument ("UTF-8", "1.0");
Use the WriteComment () method to output a comment:

Xmlstreamwriter.writecomment ("A oreilly Journal Catalog");
Use the WriteProcessingInstruction () method to output a single processing instruction:

Xmlstreamwriter.writeprocessinginstruction ("Catalog", "Journal= ' oreilly '");
Use the WriteStartElement () method to output the start of the ' catalog ' element (the element prefix and namespace URI can also be specified in this method):

Xmlstreamwriter.writestartelement ("journal", "Catalog", "http://OnJava.com/Journal");
Use the Writenamespace () method to add a ' journal ' namespace declaration (the namespace prefix and namespace URI are also specified in this method):

Xmlstreamwriter.writenamespace ("journal", "Http://OnJava.com/Journal");
Add the xsi namespace again using the Writenamespace () method:

Xmlstreamwriter.writenamespace ("xsi", "http://www.w3.org/2001/XMLSchema-instance");
Add the Xsi:namespaceschemalocation property using the WriteAttribute () method:

Xmlstreamwriter.writeattribute ("Xsi:nonamespaceschemalocation", "file://c:/Schemas/catalog.xsd");
Add the Publisher property using the WriteAttribute () method:

Xmlstreamwriter.writeattribute ("publisher", "oreilly");
Outputs the start of the ' journal ' element. When a new element is added, the ' > ' bracket of the previous element is added:

Xmlstreamwriter.writestartelement ("journal", "Journal", "http:
Onjava.com/journal ");
Use the WriteAttribute () method to add the ' date ' and ' title ' attributes. Then, use the Writeelement () method to add the ' article ' and ' title ' elements. Then, use the Writecharacters () method to output the text of the ' title ' element:

Xmlstreamwriter.writecharacters ("Data Binding with XMLBeans");
Any element that contains text or child elements must have an end tag. Use the WriteEndElement () element to add the end tag of the ' title ' element:

Xmlstreamwriter.writeendelement ();
Add the end tag of the ' Author ' element and ' journal ' element. In the WriteEndElement () method, you do not have to specify the element prefix and namespace URI. Add another ' journal ' element in a similar way. Then, add the end tag of the ' catalog ' element. Finally, output the buffered data:

Xmlstreamwriter.flush ();
Last step, close xmlstreamwriter.

Xmlstreamwriter.close ();
This is the process of generating catalog.xml.

Listing 2 in the source code shows the full Java application-xmlwriter.java. This application can be run as a command-line application or in an IDE such as Eclipse.

　　 v. Using Xmlstreamreader for Analysis

By using the Xmlstreamreader API to analyze the documentation in Listing 1, let's analyze how it works. Xmlstreamreader uses a cursor to parse an XML document. Its interface contains a next () method-it analyzes the next profiling event. The Geteventtype () method returns the event type. The following code snippet comes from the Xmlparser.java application, as detailed in Listing 3.

In this Xmlparser.java application, first, you want to import the Stax class:

Import javax.xml.stream.*;
Import javax.xml.stream.events.*;
Import Javax.xml.stream.XMLInputFactory;
Then, create a xmlinputfactory, from which you will get a xmlstreamreader:

Xmlinputfactory inputfactory=xmlinputfactory.newinstance ();
Now, you need to create a inputstream as an input stream that describes the files that will be parsed. In addition, create a xmlstreamreader from the Xmlinputfactory object that you created earlier.

InputStream input=new FileInputStream (New File ("C:/stax/catalog.xml"));
Xmlstreamreader xmlstreamreader =inputfactory.createxmlstreamreader (input);
If more analysis events are available, the Hasnext () method returns True. Then, use the next () method to obtain the next profiling event:

int Event=xmlstreamreader.next ();
Compared to SAX analysis, the advantage of Stax analysis is that an analysis event can be skipped-by calling the next () method, as described in the following code. For example, if the profiling event type is entity_declaration, the developer can decide whether to get the event information from the current event or retrieve the next event:

If (Event.geteventtype () ==xmlstreamconstants.entity_declaration) {
int Event=xmlstreamreader.next ();
}
Analysis can also be deferred by not calling the next () method. The next () method returns an int, which represents an analysis event-specified by using a Xmlstreamconstants constant.

The different event types returned by Xmlstreamreader are listed in table 1.

Event Type	Describe
Start_document	The beginning of a document
Start_element	The beginning of an element
ATTRIBUTE	An element attribute
NAMESPACE	A namespace declaration
CHARACTERS	The character can be text, or a space
COMMENT	A note
MySpace	Spaces that can be ignored
Processing_instruction	Processing instructions
Dtd	A DTD
Entity_reference	An entity reference
Cdata	CDATA section
End_element	End Element
End_document	End Document
Entity_declaration	An entity declaration
Notation_declaration	A flag declaration

Table 1. Xmlstreamreader Events

These different analysis events enable you to obtain data and metadata in an XML document. If the profiling event type is start_document, you will use the GetEncoding () method to get the specified encoding in the XML document, and you will use the GetVersion () method to return the XML version of the XML document.

Similarly, if you are working with a Start_element event type, you will use the Getprefix () method to return the element prefix and use Getnamespaceuri to return the element prefix namespace or the default namespace. To get the local name of the element, you will use the Getlocalname () method and use the Getattributescount () method to get the number of attributes. You will use the Getattributeprefix (i) method to get a property prefix for the specified property index I, and use the Getattributenamespace (i) method to get the property namespace. Use the Getattributelocalname (i) method to get the property local name and use the Getattributevalue (i) method to get the property value. If the event type is characters or comment, the GetText () method is used to obtain the appropriate text.

Listing 4 shows the sample XML document, Catalog.xml, and the results of the analysis output.

Listing 3 shows the Java application used to parse the XML document. You can run the application from the command line or in an IDE such as Eclipse. Remember: If you didn't run the Xmlwriter.java application first and ran Xmlparser.java (see Listing 2 in the source code), then you need to copy Catalog.xml (see Listing 1 in source code) to the C:/stax directory.

　　 Vi. using Xmleventreader for analysis

This section will show you how to use Xmleventreader to analyze Catalog.xml. The Xmleventreader interface uses an event object iterative operator to parse an XML document; In this way an XML event generates a Xmlevent object. Xmleventreader is similar to the xmlstreamreader-parsing event that was generated by the Stax parser. However, Xmleventreader has one advantage over Xmlstreamreader: By using Xmleventreader, an application can "peek" at the next event using the Peek () method without having to read the event from the stream. This allows an application client to decide whether it is necessary to parse the next event. The Code snippets in this section are excerpted from the Xmleventparser.java application, see Listing 5.

First, import the Stax class:

Import javax.xml.stream.*;
Import javax.xml.stream.events.*;
Import Javax.xml.stream.XMLInputFactory;
Next, create a xmlinputfactory by which to get a Xmleventreader object:

Xmlinputfactory inputfactory=xmlinputfactory.newinstance ();
InputStream input=new FileInputStream (New File ("C:/stax/catalog.xml"));
Xmleventreader xmleventreader =inputfactory.createxmleventreader (input);
In Stax, XML document events are described by Xmlevent objects. Use the NextEvent () method to traverse the Xmleventreader object to get the next event:

Xmlevent event=xmleventreader.nextevent ();
Use the Geteventtype () method to get the event type (refer to table 1). The Xmlevent interface also provides a Boolean method to obtain the event type. For example, Isstartdocument () returns True if the event is the start document type. In the following code, the event is the start element type, so a Startelement object can be obtained from this xmlevent interface:

if (Event.isstartelement ()) {
Startelement startelement=event.asstartelement ();
}
Use the GetAttributes () method to get the element properties:

Iterator Attributes=startelement.getattributes ();
This iterator describes a Javax.xml.stream.events.Attribute object. Use the next () method to traverse the iterator.

Attribute attribute= (Javax.xml.stream.events.Attribute) (Attributes.next ());
Finally, the property name is obtained using the GetName () method, and the property value is obtained using the GetValue () method.

Listing 5 shows a Java application that analyzes the XML document. Application Xmleventreader can be run as a command-line application or in an IDE such as Eclipse. Remember: If you run a Xmlwriter.java or Xmlparser.java application without first running the Xmleventparser.java application, you will need to copy Catalog.xml to the C:/stax directory.

In the end, the event rules are provided to the parser application rather than to the parser based on the pull event generation.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Parsing XML data based on Stax in JDK 6.0

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Parsing XML data based on Stax in JDK 6.0

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support