Geronimo rebellion: using the Integrated Software Package: codehaus's woodstox

Last Update:2018-12-03 Source: Internet

Author: User

Tags representational state transfer java se

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

From: http://www.ibm.com/developerworks/cn/opensource/os-ag-renegade15/

XML importance

XML was introduced by Tim Bray and Michael sperberg-McQueen in 1996. Its potential has been widely recognized, but it is hard to imagine which person will know what the XML will be. Enterprise Java developers use XML for configuration, data storage, and the most common format for data exchange. It is the basis of Web Services and soap, and thus the basis of the design pattern of Modern Service-Oriented Architecture (SOA. However, XML is not stopped there. It integrates X into Ajax or Asynchronous JavaScript + XML, and becomes the key for modern web applications to provide an unprecedented rich experience.

However, XML is not a panacea for treating all diseases; it also has shortcomings. XML documents are usually large. XML documents all have a common tree structure, but the scalability of these XML documents means that their patterns can be ever-changing. These aspects pose challenges to the efficient parsing of XML. There are two traditional methods to overcome the challenges of XML parsing: Dom and sax.

XML Processing: Dom and sax

Dom and sax are two typical XML parsing policies. In many aspects, they are opposite strategies of nature. Dom provides a simple object model for XML documents. The Dom parser converts an XML document to an easy-to-use object that represents all the data in the document. However, in this case, the XML document requires a certain cost: Dom parsing often requires a lot of memory.

Memory is not a problem for Sax. The SAX Parser generates a series of parsing events. Handler registers the callback of these events and then executes some logic for the data from these events. It is fast and efficient, but requires a complex programming model.

Measure the test taker's knowledge about the simplest way to use the Differences Between Dom and sax-and learn about the motives and advantages of Stax-by looking at specific examples.

Back to Top

Example of using Flickr resolution

It is not difficult to find some XML for parsing. XML is used everywhere. Currently, most web sites provide XML-based Web Services. Flickr is a popular photo sharing site owned by Yahoo. It has powerful and flexible APIs. Let's take a look at some simple code for accessing the "interesting" photo of Flickr (to obtain all the source code used in this article, seeDownloadAnd make sure to put woodstox into the class path or use JDK 1.6 ). The code is shown in Listing 1:

Listing 1. Using the Flickr API

                String apiKey = "c4579586f41a90372f762cb65c78be5d";String urlStr = "http://api.flickr.com/services/rest/?" + "method=flickr.interestingness.getList&per_page=20&api_key="+apiKey;URL request = new URL(urlStr);InputStream input = request.openStream();

This Code uses the representative status transfer (representational state transfer, rest) API of Flickr (for more information about the API and rest format of Flickr, seeReferencesPart ). The sample output of the above call is shown in List 2:

Listing 2. xml of Flickr

                <?xml version="1.0" encoding="utf-8" ?><rsp stat="ok"><photos page="1" pages="25" per_page="20" total="500">     <photo id="469774979" owner="35373726@N00" secret="c8a1be2012" server="183" farm="1" title="Where will it lead me......?" ispublic="1" isfriend="0" isfamily="0" />     <photo id="470281793" owner="73955226@N00" secret="49612a2794" server="212" farm="1" title="Island Beauty" ispublic="1" isfriend="0" isfamily="0" />     <photo id="469808244" owner="43568064@N00" secret="26b71544a3" server="227" farm="1" title="" ispublic="1" isfriend="0" isfamily="0" /></photos></rsp>

Note that listing 2 only shows three photos. The API call actually returns 20 (per_pageParameters ). The result is very simple, so let's take a look at how to parse this XML. In this example, the title and ID of each photo are parsed. This ID can be used to create the URL of the photo, so it is hard to imagine that the Web application (probably mashup) only uses this information. First, use Dom to extract the data.

Back to Top

Dom example

To use Dom, you need to parse the document into a document object. This is the in-memory tree structure of the parsed XML document. Then, we traverse the DOM tree to find the title and ID of each photo. Put this data into simple ing. The code to complete this process is shown in listing 3:

Listing 3. Using dom for parsing

                Map<String,String> map = new HashMap<String,String>();DocumentBuilder builder =    DocumentBuilderFactory.newInstance().newDocumentBuilder();Document dom = builder.parse(input);Element root = dom.getDocumentElement();NodeList childNodes = root.getChildNodes();Node photosNode = null;for (int i=0;i<childNodes.getLength();i++){     Node node = childNodes.item(i);     if (node.getNodeName().equalsIgnoreCase("photos")){          photosNode = node;          break;     }}childNodes = photosNode.getChildNodes();for (int i=0;i<childNodes.getLength();i++){     Node node = childNodes.item(i);     if (node.getNodeName().equalsIgnoreCase("photo")){          String title = node.getAttributes().getNamedItem("title").getTextContent();          String id = node.getAttributes().getNamedItem("id").getTextContent();          map.put(id,title);     }}

Dom is very popular because it is very easy to use. You only need to pass the input source to the parser, And the parser will provide youdocumentObject. Then, you can traverse the sub-nodes until you find the photo node. Each photo node is a sub-node of the photo node, so you will traverse each photo node and then accesstitleAndidAttribute and store it in the ing.

However, Dom also has some obvious inefficiency. You need to store a large amount of data that may not be concerned, such as the owner of each photo. You will also browse all the data twice: The first browsing is used to read it into the document object, and then the second browsing when traversing the Document Object. The traditional method to avoid these inefficiency is to use sax.

Back to Top

Sax example

The SAX Parser does not return a precisedocumentObject. On the contrary, the SAX Parser provides a series of events when traversing XML documents. Must be implemented through interfaces or extensionsDefaultHandlerClass and rewrite its method as needed to create the handler of these events. Listing 4 demonstrates the sax parsing of the Flickr XML document.

Listing 4. Using Sax for parsing

                SAXParser parser = SAXParserFactory.newInstance().newSAXParser();DefaultHandler handler = new DefaultHandler(){     @Override     public void startElement(String uri, String localName,      String qName, Attributes attributes) throws SAXException {          if (qName.equalsIgnoreCase("photo")){               String title = attributes.getValue("title");               String id = attributes.getValue("id");               // map is static so we can access it here               map.put(id, title);          }     }};parser.parse(input, handler);

Obviously, the code ratio shown in Listing 4Listing 3The Dom code in is hard to understand. You need to useContentHandlerTo process the sax event.DefaultHandlerAnd overwrites itsstartElementCallback method. Check whether it is a photo element, and if it is a photo element, access itstitleAndidAttribute.

The code is very concise and efficient at runtime. It only stores the data you care about and only traverses the document once. It is more complex code and requires extension classes to register event listening programs. It would be great to parse XML efficiently and use a more intuitive programming model. Stax came into being.

Back to Top

Stax alternatives

The complexity in Sax Comes from its implemented observer design pattern. It is a push model, because the parser will push the event to the observer that then acts on the event. The Stax model is similar to sax. It streamline data and events from XML documents so that they can be as fast and efficient as sax. The biggest difference is that it uses the PULL model. This will allow the application code to pull events from the parser.

This may sound like a subtle difference, but it will allow simpler programming models. Check listing 5 to see how Stax works.

Listing 5. Use Stax for parsing

                Map<String,String> map = new HashMap<String,String>();XMLInputFactory inputFactory = XMLInputFactory.newInstance();QName qId = new QName("id");QName qTitle = new QName("title");QName qPhoto = new QName("photo");XMLEventReader  reader = inputFactory.createXMLEventReader(input);while (reader.hasNext()){     XMLEvent event = reader.nextEvent();     if (event.isStartElement()){          StartElement element = event.asStartElement();          if (element.getName().equals(qPhoto)){               String id = element.getAttributeByName(qId).getValue();               String title = element.getAttributeByName(qTitle).getValue();               map.put(id,title);          }     }}reader.close();

First, you do not need to extend any class. This is because you do not need to register for the event. With Stax, you can control event streams because these event streams will be pull from the parser. You can use a familiar iterator-like syntax to search the entire document for the required data. You will still store only the required data and only need to browse the XML document once. You will get the same efficiency as using sax, but the code will be much more intuitive.

Back to Top

Use woodstox as The Stax provider for Geronimo

Now you have seen the advantages of Stax parsing. It is widely recognized as a significant improvement in XML technology. Therefore, it is not surprising when it becomes part of the Java ee 5 specification (it is even included in Java platform, Standard Edition [Java SE] 6 ). Since it is part of Java ee 5, it must be implemented by Geronimo 2.0.

The Geronimo team is very lucky to have several open-source Stax implementations to choose from. The team selected woodstox as The Stax parser attached to Geronimo. Woodstox is considered to be one of the Stax implementations with the best execution results (to compare various Stax parsers, seeReferences). In addition, woodstox is dual-authorized under the lesser General Public License (lgpl) and Apache 2.0 licenses. Therefore, you can integrate woodstox and its source code into Geronimo without any restrictions.

Application procedural optimization: to maximize the effectiveness of woodstox

Performance is clearly one of the advantages that woodstox brings to Geronimo. Just like using other high-performance technologies, it is important to know how to use woodstox for optimal performance.Listing 5The code in will useXMLEventReaderInterface, which is an advanced api included in the Stax specification. Which of the following APIs is used to obtain high-performance and low-level APIs?XMLStreamReaderInterface. Listing 6 shows the Stax parser that uses this interface.

Listing 6. UseXMLStreamReaderStax Parsing

                Map<String,String> map = new HashMap<String,String>();XMLInputFactory inputFactory = XMLInputFactory.newInstance();QName qId = new QName("id");QName qTitle = new QName("title");QName qPhoto = new QName("photo");XMLStreamReader reader = inputFactory.createXMLStreamReader(input);while (reader.hasNext()){    int event = reader.next();    if (event == START_ELEMENT){ // statically included constant from XMLStreamConstants         if (reader.getName().equals(qPhoto)){               String id = reader.getAttributeValue(null, qId.getLocalPart());               String title = reader.getAttributeValue(null, qTitle.getLocalPart());               map.put(id,title);          }     } }reader.close();

The code in Listing 6 is similarListing 5Although it is obviously a little low-level, you will get a lot of performance improvement.

Conclusion

You have learned some advantages of using the Stax parser to parse XML documents. Stax provides a good compromise between Sax and Dom. You can use Stax immediately by using it as part of Geronimo 2.0. You will not only start using Stax's intuitive pull API, but also gain additional advantages in using Stax's high-performance implementation in woodstox.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More