From: http://www.ibm.com/developerworks/cn/opensource/os-ag-renegade15/
XML importance
XML was introduced by Tim Bray and Michael sperberg-McQueen in 1996. Its potential has been widely recognized, but it is hard to imagine which person will know what the XML will be. Enterprise Java developers use XML for configuration, data storage, and the most common format for data exchange. It is the basis of Web Services and soap, and thus the basis of the design pattern of Modern Service-Oriented Architecture (SOA. However, XML is not stopped there. It integrates X into Ajax or Asynchronous JavaScript + XML, and becomes the key for modern web applications to provide an unprecedented rich experience.
However, XML is not a panacea for treating all diseases; it also has shortcomings. XML documents are usually large. XML documents all have a common tree structure, but the scalability of these XML documents means that their patterns can be ever-changing. These aspects pose challenges to the efficient parsing of XML. There are two traditional methods to overcome the challenges of XML parsing: Dom and sax.
XML Processing: Dom and sax
Dom and sax are two typical XML parsing policies. In many aspects, they are opposite strategies of nature. Dom provides a simple object model for XML documents. The Dom parser converts an XML document to an easy-to-use object that represents all the data in the document. However, in this case, the XML document requires a certain cost: Dom parsing often requires a lot of memory.
Memory is not a problem for Sax. The SAX Parser generates a series of parsing events. Handler registers the callback of these events and then executes some logic for the data from these events. It is fast and efficient, but requires a complex programming model.
Measure the test taker's knowledge about the simplest way to use the Differences Between Dom and sax-and learn about the motives and advantages of Stax-by looking at specific examples.
Example of using Flickr resolution
It is not difficult to find some XML for parsing. XML is used everywhere. Currently, most web sites provide XML-based Web Services. Flickr is a popular photo sharing site owned by Yahoo. It has powerful and flexible APIs. Let's take a look at some simple code for accessing the "interesting" photo of Flickr (to obtain all the source code used in this article, seeDownloadAnd make sure to put woodstox into the class path or use JDK 1.6 ). The code is shown in Listing 1:
Listing 1. Using the Flickr API
String apiKey = "c4579586f41a90372f762cb65c78be5d";String urlStr = "http://api.flickr.com/services/rest/?" + "method=flickr.interestingness.getList&per_page=20&api_key="+apiKey;URL request = new URL(urlStr);InputStream input = request.openStream(); |
This Code uses the representative status transfer (representational state transfer, rest) API of Flickr (for more information about the API and rest format of Flickr, seeReferencesPart ). The sample output of the above call is shown in List 2:
Listing 2. xml of Flickr
<?xml version="1.0" encoding="utf-8" ?><rsp stat="ok"><photos page="1" pages="25" per_page="20" total="500"> <photo id="469774979" owner="35373726@N00" secret="c8a1be2012" server="183" farm="1" title="Where will it lead me......?" ispublic="1" isfriend="0" isfamily="0" /> <photo id="470281793" owner="73955226@N00" secret="49612a2794" server="212" farm="1" title="Island Beauty" ispublic="1" isfriend="0" isfamily="0" /> <photo id="469808244" owner="43568064@N00" secret="26b71544a3" server="227" farm="1" title="" ispublic="1" isfriend="0" isfamily="0" /></photos></rsp> |
Note that listing 2 only shows three photos. The API call actually returns 20 (per_page
Parameters ). The result is very simple, so let's take a look at how to parse this XML. In this example, the title and ID of each photo are parsed. This ID can be used to create the URL of the photo, so it is hard to imagine that the Web application (probably mashup) only uses this information. First, use Dom to extract the data.
Dom example
To use Dom, you need to parse the document into a document object. This is the in-memory tree structure of the parsed XML document. Then, we traverse the DOM tree to find the title and ID of each photo. Put this data into simple ing. The code to complete this process is shown in listing 3:
Listing 3. Using dom for parsing
Map<String,String> map = new HashMap<String,String>();DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();Document dom = builder.parse(input);Element root = dom.getDocumentElement();NodeList childNodes = root.getChildNodes();Node photosNode = null;for (int i=0;i<childNodes.getLength();i++){ Node node = childNodes.item(i); if (node.getNodeName().equalsIgnoreCase("photos")){ photosNode = node; break; }}childNodes = photosNode.getChildNodes();for (int i=0;i<childNodes.getLength();i++){ Node node = childNodes.item(i); if (node.getNodeName().equalsIgnoreCase("photo")){ String title = node.getAttributes().getNamedItem("title").getTextContent(); String id = node.getAttributes().getNamedItem("id").getTextContent(); map.put(id,title); }} |
Dom is very popular because it is very easy to use. You only need to pass the input source to the parser, And the parser will provide youdocument
Object. Then, you can traverse the sub-nodes until you find the photo node. Each photo node is a sub-node of the photo node, so you will traverse each photo node and then accesstitle
Andid
Attribute and store it in the ing.
However, Dom also has some obvious inefficiency. You need to store a large amount of data that may not be concerned, such as the owner of each photo. You will also browse all the data twice: The first browsing is used to read it into the document object, and then the second browsing when traversing the Document Object. The traditional method to avoid these inefficiency is to use sax.
Sax example
The SAX Parser does not return a precisedocument
Object. On the contrary, the SAX Parser provides a series of events when traversing XML documents. Must be implemented through interfaces or extensionsDefaultHandler
Class and rewrite its method as needed to create the handler of these events. Listing 4 demonstrates the sax parsing of the Flickr XML document.
Listing 4. Using Sax for parsing
SAXParser parser = SAXParserFactory.newInstance().newSAXParser();DefaultHandler handler = new DefaultHandler(){ @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { if (qName.equalsIgnoreCase("photo")){ String title = attributes.getValue("title"); String id = attributes.getValue("id"); // map is static so we can access it here map.put(id, title); } }};parser.parse(input, handler); |
Obviously, the code ratio shown in Listing 4Listing 3The Dom code in is hard to understand. You need to useContentHandler
To process the sax event.DefaultHandler
And overwrites itsstartElement
Callback method. Check whether it is a photo element, and if it is a photo element, access itstitle
Andid
Attribute.
The code is very concise and efficient at runtime. It only stores the data you care about and only traverses the document once. It is more complex code and requires extension classes to register event listening programs. It would be great to parse XML efficiently and use a more intuitive programming model. Stax came into being.
Stax alternatives
The complexity in Sax Comes from its implemented observer design pattern. It is a push model, because the parser will push the event to the observer that then acts on the event. The Stax model is similar to sax. It streamline data and events from XML documents so that they can be as fast and efficient as sax. The biggest difference is that it uses the PULL model. This will allow the application code to pull events from the parser.
This may sound like a subtle difference, but it will allow simpler programming models. Check listing 5 to see how Stax works.
Listing 5. Use Stax for parsing
Map<String,String> map = new HashMap<String,String>();XMLInputFactory inputFactory = XMLInputFactory.newInstance();QName qId = new QName("id");QName qTitle = new QName("title");QName qPhoto = new QName("photo");XMLEventReader reader = inputFactory.createXMLEventReader(input);while (reader.hasNext()){ XMLEvent event = reader.nextEvent(); if (event.isStartElement()){ StartElement element = event.asStartElement(); if (element.getName().equals(qPhoto)){ String id = element.getAttributeByName(qId).getValue(); String title = element.getAttributeByName(qTitle).getValue(); map.put(id,title); } }}reader.close(); |
First, you do not need to extend any class. This is because you do not need to register for the event. With Stax, you can control event streams because these event streams will be pull from the parser. You can use a familiar iterator-like syntax to search the entire document for the required data. You will still store only the required data and only need to browse the XML document once. You will get the same efficiency as using sax, but the code will be much more intuitive.
Use woodstox as The Stax provider for Geronimo
Now you have seen the advantages of Stax parsing. It is widely recognized as a significant improvement in XML technology. Therefore, it is not surprising when it becomes part of the Java ee 5 specification (it is even included in Java platform, Standard Edition [Java SE] 6 ). Since it is part of Java ee 5, it must be implemented by Geronimo 2.0.
The Geronimo team is very lucky to have several open-source Stax implementations to choose from. The team selected woodstox as The Stax parser attached to Geronimo. Woodstox is considered to be one of the Stax implementations with the best execution results (to compare various Stax parsers, seeReferences). In addition, woodstox is dual-authorized under the lesser General Public License (lgpl) and Apache 2.0 licenses. Therefore, you can integrate woodstox and its source code into Geronimo without any restrictions.
Application procedural optimization: to maximize the effectiveness of woodstox
Performance is clearly one of the advantages that woodstox brings to Geronimo. Just like using other high-performance technologies, it is important to know how to use woodstox for optimal performance.Listing 5The code in will useXMLEventReader
Interface, which is an advanced api included in the Stax specification. Which of the following APIs is used to obtain high-performance and low-level APIs?XMLStreamReader
Interface. Listing 6 shows the Stax parser that uses this interface.
Listing 6. UseXMLStreamReader
Stax Parsing
Map<String,String> map = new HashMap<String,String>();XMLInputFactory inputFactory = XMLInputFactory.newInstance();QName qId = new QName("id");QName qTitle = new QName("title");QName qPhoto = new QName("photo");XMLStreamReader reader = inputFactory.createXMLStreamReader(input);while (reader.hasNext()){ int event = reader.next(); if (event == START_ELEMENT){ // statically included constant from XMLStreamConstants if (reader.getName().equals(qPhoto)){ String id = reader.getAttributeValue(null, qId.getLocalPart()); String title = reader.getAttributeValue(null, qTitle.getLocalPart()); map.put(id,title); } } }reader.close(); |
The code in Listing 6 is similarListing 5Although it is obviously a little low-level, you will get a lot of performance improvement.
Conclusion
You have learned some advantages of using the Stax parser to parse XML documents. Stax provides a good compromise between Sax and Dom. You can use Stax immediately by using it as part of Geronimo 2.0. You will not only start using Stax's intuitive pull API, but also gain additional advantages in using Stax's high-performance implementation in woodstox.