[Reprint] Use Stax to parse XML

Source: Internet
Author: User
Tags cdata

Stax Overview

From the very beginning, Java API for XML Processing (JAXP) provides two methods to process XML: the Document Object Model (DOM) method uses a standard Object Model to represent XML documents; the Simple API for XML (SAX) method uses the event handler provided by the application to process XML. JSR-173 proposes a new method for stream-oriented: streaming API for XML (Stax ). Its final version was released in March 2004 and will become part of JAXP 1.4 (which will be included in the forthcoming Java 6 release.

As the name implies, Stax focuses on the stream. In fact, the difference between Stax and other methods is that the application can process XML as an event stream. The idea of processing XML as a group of events is not novel (in fact, Sax has already been proposed), but the difference is that Stax allows application code to pull these events one by one, you do not need to provide a handler for receiving events from the parser at the convenience of the parser.

Stax actually includes two sets of APIS for processing XML, which provide different levels of abstraction. Pointer-Based APIS allow applications to process XML as a tag (or event) stream. Applications can check the status of the parser to obtain the information of the last tag to be parsed, then process the next tag, and so on. This is a low-level API. Despite its high efficiency, it does not provide the abstraction of the underlying XML structure. The more advanced iterator-based API allows applications to process XML as a series of event objects, and each object and application exchange part of the XML structure. The application only needs to determine the type of the event to be parsed, convert it to the corresponding specific type, and then use its method to obtain information about the event.

J2EE/XML developers usually use the Document Object Model (DOM) API or simple API for XML (SAX) API to analyze XML documents. However, these APIs have their own shortcomings. One of the disadvantages of Dom APIS is that it consumes a lot of memory, because a complete memory structure of the XML document must be created before the XML document can be navigated. The disadvantage of the sax API is that it provides an example of a push Analysis Model API, where the analysis event is generated by the analyzer. In comparison, Stax is based on a pull analysis model. In this article, you will first create your own XML document, and then learn to use different methods for analysis. Finally, we will use the Stax pull method generated by the event.
  I. Push analysis to pull Analysis
Compared with push analysis, pull analysis has the following advantages:
1. In pull analysis, events are generated by the analysis application, so the analysis rules are provided to the client rather than the analyzer.
2. The code for pulling analysis is simpler and has fewer libraries than the code for pushing analysis.
3. Pull the analysis client to read multiple XML documents at the same time.
4. Pull analysis allows you to filter XML documents and skip analysis events.
  2. Learn about Stax
The XML-based streaming API (Stax) was introduced in JSR March 2004 in 173. It is an XML-based streaming pull analysis API. Stax is a new feature provided by JDK 6.0. You can download its test version from this.
An upstreaming model analyzer continuously generates events until the XML document is completely analyzed. However, pull analysis is adjusted by the application; therefore, analysis events are generated by the application. This means that with Stax, you can postpone analysis-skipping elements during analysis and analyzing multiple documents. When using Dom APIs, you must analyze the entire XML document into a DOM structure, which reduces the analysis efficiency. With the help of Stax, the analysis event is generated when the XML document is analyzed. Comparison between Stax analyzer and other analyzer is not described here.
The implementation of Stax API is to use Java Web Service Development (jwsdp) 1.6, combined with the Sun Java streaming XML analyzer (sjsxp)-it is located in the javax. xml. Stream package. The xmlstreamreader interface is used to analyze an XML document, while the xmlstreamwriter interface is used to generate an XML document. Xmleventreader is responsible for analyzing XML events using an object event iteration sub-analysis-this is in contrast to the cursor mechanism used by xmlstreamreader. This tutorial analyzes an XML document based on the Stax implementation in JDK 6.0.
In fact, Stax is only one of the New XML features provided by JDK 6.0. The new JDK 6.0 also provides Java architecture (JAX-WS) 2.0 for XML-Web Services, for XML-bound Java API (jaxb) 2.0, the XML Digital Signature API supports SQL: 2003 'xml' data types.
  Iii. Initial installation
If you are using JDK 6.0, Stax API is located in classpath by default. If you are using jwsdp 1.6, add the jwsdp 1.6 Stax API to classpath. This requires adding <jwsdp-1.6> \ sjsxp \ Lib \ jsr173_api.jar and <jwsdp-1.6> \ sjsxp \ Lib \ sjsxp. jar to the classpath variable. Install jwsdp 1.6 in the <jwsdp-1.6> directory. Jsr173_api.jar corresponds to the JSR-173 API jar, and sjsxp. Jar corresponds to the sjxsp implementation jar.4. Use xmlstreamwriter for write operations
First, you need to create the XML document to be analyzed. XML is generated by xmlstreamwriter of Stax. However, one limitation of xmlstreamwriter is that it does not necessarily generate a well-structured document-and the generated document is not necessarily valid. Make sure that the generated XML document is well structured. List 1 is an example of the original XML document generated by xmlstreamwriter.
Here, you try to use the xmlstreamwriter API to generate cat1. XML in List 1. The code snippets in this section are excerpted from the xmlwriter. Java application and are displayed in List 2. First, you will import the Stax package class, please refer to the following encoding:

Import javax. xml. Stream. *; import javax. xml. Stream. Events. *; import javax. xml. Stream. xmloutputfactory;


You need to get your xmlstreamwriter from an xmloutputfactory. Therefore, you must first create a new xmloutputfactory:

Xmloutputfactory outputfactory = xmloutputfactory. newinstance ();


Next, create a filewriter to output the XML document-it will be generated into an XML file:

Filewriter output = new filewriter (new file ("C:/Stax/CATALOG. xml "));


Next, create an xmlstreamwriter:

Xmlstreamwriter xmlstreamwriterr = outputfactory. createxmlstreamwriter (output );


Now, use the writestartdocument () method to create the beginning of a document. Add the encoding and version to be specified in the XML Declaration (remember that the specified encoding is not the encoding of the generated XML document ). What if you need to specify the encoding of the XML document? When creating an xmlstreamwriter object from an xmloutputfactory object, you will do this:

Xmlstreamwriter. writestartdocument ("UTF-8", "1.0 ");


Use the writecomment () method to output a comment:

Xmlstreamwriter. writecomment ("A oreilly Journal Catalog ");


Use the writeprocessinginstruction () method to output a processing command:

Xmlstreamwriter. writeprocessinginstruction ("catalog", "journal = 'oreilly '");


Use the writestartelement () method to output the start of the 'catalog 'element (the element prefix and namespace URI can also be specified in this method ):

Xmlstreamwriter. writestartelement ("journal", "catalog", "http://OnJava.com/Journal ");


Use the writenamespace () method to add the 'journal 'namespace Declaration (The namespace prefix and namespace URI are also specified in this method ):

Xmlstreamwriter. writenamespace ("journal", "http://OnJava.com/Journal ");


Use writenamespace () to add the xsi namespace again:

Xmlstreamwriter. writenamespace ("xsi", "http://www.w3.org/2001/XMLSchema-instance ");


Use the writeattribute () method to add the xsi: namespaceschemalocation attribute:

Xmlstreamwriter. writeattribute ("xsi: nonamespaceschemalocation", "file: // C:/schemas/CATALOG. XSD ");


Use the writeattribute () method to add the 'publisher' attribute:

Xmlstreamwriter. writeattribute ("publisher", "oreilly ");


Output the start of the 'journal 'element. When a new element is added, the '>' brackets of the previous element are also added:

Xmlstreamwriter. writestartelement ("journal", "journal", "http: // onjava.com/journal ");


Use the writeattribute () method to add the 'date' and 'title' attributes. Then, use the writeelement () method to add the 'Article' and 'title' elements. Then, use the writecharacters () method to output the text of the 'title' element:

Xmlstreamwriter. writecharacters ("data binding with xmlbeans ");


Any element that contains text or child elements must have an end tag. Use the writeendelement () element to add the end tag of the 'title' element:

Xmlstreamwriter. writeendelement ();


Add the end tag of the 'autor' element and 'journal 'element. In the writeendelement () method, you do not need to specify the element prefix and namespace URI. Add another 'journal 'element in a similar way. Then, add the end tag of the 'catalog 'element. Finally, the buffered data is output:

Xmlstreamwriter. Flush ();


In the last step, close xmlstreamwriter.

Xmlstreamwriter. Close ();


This is the process of generating catalog. xml.

List 2 in the source code shows the complete Java application-xmlwriter. java. This application can run as a command line application or in an IDE such as Eclipse.

5. Use xmlstreamreader for analysis
By using the xmlstreamreader API to analyze the document in List 1, we will analyze its working principle in detail. Xmlstreamreader uses a cursor to analyze XML documents. Its Interface contains a next () method for analyzing the next analysis event. The geteventtype () method returns the event type. The code snippet is from the xmlparser. Java application. For details, see list 3.
In this xmlparser. Java application, first import the Stax class:

Import javax. xml. Stream. *; import javax. xml. Stream. Events. *; import javax. xml. Stream. xmlinputfactory;


Then, create an xmlinputfactory, and you will get an xmlstreamreader:

Xmlinputfactory inputfactory = xmlinputfactory. newinstance ();


Now, you need to create an inputstream as an input stream, which describes the file to be analyzed. In addition, you must create an xmlstreamreader from the xmlinputfactory object created earlier.

Inputstream input = new fileinputstream (new file ("C:/Stax/CATALOG. xml"); xmlstreamreader = inputfactory. createxmlstreamreader (input );


If more analysis events are available, the hasnext () method returns true. Then, use the next () method to obtain the next analysis event:

Int event = xmlstreamreader. Next ();


Compared with the sax analysis, Stax analysis has the advantage that an analysis event can be skipped-by calling the next () method, see the following code. For example, if the analysis event type is entity_declaration, the developer can decide whether to obtain the event information from the current event or retrieve the next event:

If (event. geteventtype () = xmlstreamconstants. entity_declaration) {int event = xmlstreamreader. Next ();}


Analysis can also be postponed without calling the next () method. The next () method returns an int, which represents an analysis event-specified by using an xmlstreamconstants constant.

The different event types returned by xmlstreamreader are listed in Table 1.

Event Type Description
Start_document Start of a document
Start_element Start of an element
Attribute One element attribute
Namespace A namespace Declaration
Characters The character can be text or a space.
Comment One comment
Space Space that can be ignored
Processing_instruction Processing Command
DTD A DTD
Entity_reference Entity reference
CDATA CDATA section
End_element End Element
End_document End document
Entity_declaration One entity Declaration
Notation_declaration A flag statement

Table 1. xmlstreamreader event

These different analysis events allow you to obtain the data and metadata in the XML document. If the analysis event type is start_document, you will use the getencoding () method to obtain the specified encoding in the XML document, and you will use the getversion () method to return the XML version of the XML document.

Similarly, if you are using a start_element event type, you will use the getprefix () method to return the element prefix and use getnamespaceuri to return the element prefix namespace or the default namespace. To get the local name of an element, you will use the getlocalname () method and the getattributescount () method to get the number of attributes. You will use the getattributeprefix (I) method to get the attribute prefix of a specified attribute index I, and use the getattributenamespace (I) method to get the attribute namespace. Use the getattributelocalname (I) method to obtain the local name of the attribute, and use the getattributevalue (I) method to obtain the attribute value. If the event type is characters or comment, use the gettext () method to obtain the corresponding text.

List 4 shows the analysis output results of the sample XML document, catalog. xml.

List 3 shows the Java application used to analyze the XML document. You can run the application from the command line or in an IDE such as Eclipse. Remember: if you do not first run xmlwriter. java applications run xmlparser. java (see list 2 in the source code. copy XML (see list 1 in source code) to the C:/Stax directory.

6. Use xmleventreader for analysis
This section describes how to use xmleventreader to analyze catalog. xml. The xmleventreader interface uses an event object iteration operator to analyze an XML document. In this way, an XML event generates an xmlevent object. Xmleventreader is similar to xmlstreamreader-An Analysis event is generated by the Stax analyzer. However, xmleventreader has an advantage over xmlstreamreader: By using xmleventreader, an application can "peek" the next event using the peek () method without having to read the event from the stream. In this way, an application client can determine whether it is necessary to analyze the next event. The code snippets in this section are excerpted from the xmleventparser. Java application. See list 5.
First, import the Stax class:

Import javax. xml. Stream. *; import javax. xml. Stream. Events. *; import javax. xml. Stream. xmlinputfactory;


Next, create an xmlinputfactory to obtain an xmleventreader object:

Xmlinputfactory inputfactory = xmlinputfactory. newinstance (); inputstream input = new fileinputstream (new file ("C:/Stax/catalog. XML "); xmleventreader = inputfactory. createxmleventreader (input );


In Stax, XML document events are described through xmlevent objects. Use the nextevent () method to traverse the xmleventreader object to obtain the next event:

Xmlevent event = xmleventreader. nextevent ();


Use the geteventtype () method to obtain the event type (see table 1 ). The xmlevent interface also provides a Boolean method to obtain the event type. For example, isstartdocument () returns true if the event is of the Start document type. In the following code, the event is the Start Element type. Therefore, a startelement object can be obtained from the xmlevent interface:

If (event. isstartelement () {startelement = event. asstartelement ();}


Use the getattributes () method to obtain the element attributes:

Iterator attributes = startelement. getattributes ();


This iterator describes a javax. xml. Stream. Events. Attribute object. Use the next () method to traverse the iterator.

Attribute attribute = (javax. xml. Stream. Events. Attribute) (attributes. Next ());


Finally, use the getname () method to get the attribute name, and use the getvalue () method to get the attribute value.

List 5 shows the Java application that analyzes the XML document. The application xmleventreader can run as a command line application or in an IDE such as Eclipse. Remember: If you run the xmlwriter. Java or xmlparser. Java application instead of running the xmleventparser. Java application, you need to copy catalog. XML to the C:/Stax directory.

Finally, the event rule is provided to the analyzer application rather than to the analyzer based on the pull event generation.

[Reprint] Use Stax to parse XML

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.