"Collection"--don't reprint the three mainstream technologies and introduction of Java processing XML

Source: Internet
Author: User
Tags xpath


Original address: http://www.ibm.com/developerworks/cn/xml/dm-1208gub/
XML (extensible Markup Language) means Extensible Markup language, which is already the software development industry most programmers and vendors to choose as a carrier for data transmission. This paper summarizes and introduces several mainstream technologies of Java processing XML, hoping to help developers with different needs to make optimal choices for XML processing technology.


Initially, the XML language was only intended to be used as a substitute for HTML language, but as the language evolved and perfected, it was increasingly discovered that it had the advantages of: Markup language extensible, strict syntax rules, meaningful tagging, The advantages of content storage and separation of performance are doomed to the language from the date of birth will be brilliant. The XML language has entered a period of rapid development after becoming a standard, of course, it has its own series of advantages and advantages are doomed to the major technology vendors to its preference, Java as a software industry, a development technology also quickly reacted to the emergence of a variety of XML support Tools, This article will be from this point of view on the Java processing of several mainstream technologies of XML, I hope to help you. In this article, you will get the following information:


    1. What good libraries and tools are available in Java for programmers to handle XML?
    2. With DOM, is it necessary to have other tool libraries?
    3. Several small routines take you quickly to understand these three ways of parsing


What are some of the best libraries and tools in Java that make it easy for programmers to handle XML?


    • The famous DOM
    • Eco-friendly SAX
    • The digester of obscurity
Introduction to three parsing methods of XML


The famous DOM



Say it's famous, but it's not too much, DOM is the standard API for processing XML, which is the basis of many other standards related to XML processing, not only Java, but also other languages such as javascript,php,ms. NET and so on, which has become the most widely used The way XML is processed. Of course, in order to provide more powerful features, Java has a lot of direct DOM extension tools, such as many Java programmers familiar with the jdom,dom4j, and so on, they basically belong to the DOM interface function expansion, preserving a lot of Dom API features, many of the original do M programmers do not even have any obstacles to master the use of the other two, intuitive, easy-to-operate way to make it popular with the vast number of Java programmers.



Eco-friendly SAX



The emergence of Sax has its special needs, why it is green, because Sax uses the least system resources and the fastest parsing method to provide support for XML processing. However, the cumbersome way to find a large number of programmers to bring a lot of trouble, often headache, and its support for the XPath query function, so that people love and hate it.



The JavaBean of Digester:xml in obscurity



Digester is an open source project under the Apache Fund, and the author's understanding of it stems from the research of the Struts framework, and whether there are many programmers who want to design a large open source framework, or even want to write a powerful framework of their own, will encounter such a problem: these various XML Language tag Framework configuration file, what technology is used to analyze the underlying framework? DOM parsing time-consuming, SAX parsing is too cumbersome, and each parsing system overhead will be too large, so, we think of the need to use the XML structure corresponding to the JavaBean to load the information, thus digester came into being. Its appearance for XML conversion to JavaBean object of the need to bring a convenient interface, so that more similar requirements have been more perfect solution, no longer require the programmer to implement this kind of cumbersome parsing program. At the same time, SUN also launched the XML and JavaBean conversion Tool class JAXB, interested readers can learn from their own.











Comparison of three analytic methods





Dom



Advantages and Disadvantages: the implementation of the world standard, there are many programming languages to support this analytic approach, and the method itself is simple and fast operation, very easy for beginners to master. It is handled by reading the XML as a tree-like structure into memory for manipulation and parsing, so that the application can modify the content and structure of the XML data, but at the same time because it needs to read the entire XML file into memory at the beginning of processing, it parses the XML of the large data volume File, you should be aware of the risk of memory leaks and program crashes.



Scope of application: Small XML file parsing, full parsing or most parsing of XML, need to modify XML tree content to generate your own object model



Sax



SAX fundamentally solves the resource-intensive problems that DOM produces when parsing XML documents. It is implemented through a stream-like parsing technique that reads through the entire XML document tree and responds to the programmer's need for XML data parsing through an event handler. Since it does not need to read the entire XML document into memory, its savings on system resources are very obvious, and it plays a very important role in some situations where large XML documents need to be handled and the performance requirements are high. SAX, which supports XPath queries, makes it more flexible for developers to work with XML. But at the same time, there are still some shortcomings that plague the vast number of developers: first of all, its very complex API interface is daunting, and secondly because it belongs to a similar stream parsing file scanning method, it does not support the application of XML tree content structure and other modifications, there may be some inconvenience.



Scope of application: Large XML file parsing, only partial parsing or only want to get part of the XML tree content, there is the need for XPath query, there is a need to generate a specific XML tree object model



Digester/jaxb



Pros and Cons: because it is a tool class based on the above two, in order to meet the special needs of converting XML into JavaBean, there is no particular obvious advantages and disadvantages. Digester, the XML parsing tool for the famous open source framework Struts, brings us a reliable way to convert XML to JavaBean.



Scope of application: There is a direct conversion of XML documents to JavaBean requirements.











Application examples





An XML fragment for parsing is given below:


Listing 1. XML Fragment

<? xml version = "1.0" encoding = "UTF-8"?>
 <books>
   <book id = "001">
      <title> Harry Potter </ title>
      <author> J K. Rowling </ author>
   </ book>
   <book id = "002">
      <title> Learning XML </ title>
      <author> Erik T. Ray </ author>
   </ book>
 </ books>
DOM parsing XML

Introduction to the DOM interface in Java: The DOM API in the JDK follows the W3C DOM specification. The org.w3c.dom package provides interfaces such as Document, DocumentType, Node, NodeList, and Element. These interfaces are all necessary to access DOM documents. We can use these interfaces to create, traverse, and modify DOM documents.

The DoumentBuilder and DocumentBuilderFactory in the javax.xml.parsers package are used to parse XML documents to generate corresponding DOM Document objects.

The DOMSource and StreamSource classes in the javax.xml.transform.dom and javax.xml.transform.stream packages are used to write updated DOM documents to XML files.

Here is an example of using DOM to parse XML:

Listing 2. DOM parsing XML
 import java.io.File;
 import java.io.IOException;
 import javax.xml.parsers.DocumentBuilder;
 import javax.xml.parsers.DocumentBuilderFactory;
 import javax.xml.parsers.ParserConfigurationException;
 import org.w3c.dom.Document;
 import org.w3c.dom.Element;
 import org.w3c.dom.Node;
 import org.w3c.dom.NodeList;
 import org.xml.sax.SAXException;

 public class DOMParser {
   DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance ();
   // Load and parse XML file into DOM
   public Document parse (String filePath) {
      Document document = null;
      try {
         // DOM parser instance
         DocumentBuilder builder = builderFactory.newDocumentBuilder ();
         // parse an XML file into a DOM tree
         document = builder.parse (new File (filePath));
      } catch (ParserConfigurationException e) {
         e.printStackTrace ();
      } catch (SAXException e) {
         e.printStackTrace ();
      } catch (IOException e) {
         e.printStackTrace ();
      }
      return document;
   }
The
   public static void main (String [] args) {
         DOMParser parser = new DOMParser ();
         Document document = parser.parse ("books.xml");
         // get root element
         Element rootElement = document.getDocumentElement ();

         // traverse child elements
         NodeList nodes = rootElement.getChildNodes ();
         for (int i = 0; i <nodes.getLength (); i ++)
         {
            Node node = nodes.item (i);
            if (node.getNodeType () == Node.ELEMENT_NODE) {
               Element child = (Element) node;
               // process child element
            }
         }

         NodeList nodeList = rootElement.getElementsByTagName ("book");
         if (nodeList! = null)
         {
            for (int i = 0; i <nodeList.getLength (); i ++)
            {
               Element element = (Element) nodeList.item (i);
               String id = element.getAttribute ("id");
            }
         }
   }
 }
In the above example, the Parse () method of DOMParser is responsible for parsing the XML file and generating the corresponding DOM Document object. The DocumentBuilderFactory is used to generate a DOM document parser to parse XML documents. After obtaining the Document object corresponding to the XML file, we can call a series of APIs to conveniently access and process the elements in the document object model. It should be noted that when the getChildNodes () method of the Element object is called, all the child nodes under it will be returned, including blank nodes, so the node type needs to be judged before processing the child Element.

It can be seen that DOM parsing XML is easy to develop. You only need to establish the DOM tree structure corresponding to XML through the parser, and you can easily use the API to access and process nodes, and support the deletion and modification of nodes. However, when the DOM parses the XML file, the content of the entire XML file is parsed into a tree structure and stored in memory, so it is not suitable for parsing large XML files with DOM.

SAX parsing XML

Unlike the way that DOM builds a tree structure, SAX uses an event model to parse XML documents, which is a faster and lighter way to parse XML documents. With SAX, XML documents can be selectively parsed and accessed without having to load the entire document like DOM, so it requires less memory. However, SAX parses the XML document as a one-time read, without creating any document object, it is difficult to access multiple data in the document at the same time.

Here is an example of SAX parsing XML:

Listing 3. SAX parsing XML
 import org.xml.sax.Attributes;
 import org.xml.sax.SAXException;
 import org.xml.sax.XMLReader;
 import org.xml.sax.helpers.DefaultHandler;
 import org.xml.sax.helpers.XMLReaderFactory;

 public class SAXParser {

   class BookHandler extends DefaultHandler {
      private List <String> nameList;
      private boolean title = false;
   
      public List <String> getNameList () {
         return nameList;
      }
      // Called at start of an XML document
      @Override
      public void startDocument () throws SAXException {
         System.out.println ("Start parsing document ...");
         nameList = new ArrayList <String> ();
      }
      // Called at end of an XML document
      @Override
      public void endDocument () throws SAXException {
         System.out.println ("End");
      }
      
      / **
       * Start processing of an element.
       * @param namespaceURI Namespace URI
       * @param localName The local name, without prefix
       * @param qName The qualified name, with prefix
       * @param atts The attributes of the element
       * /
      @Override
      public void startElement (String uri, String localName, String qName,
Attributes atts) throws SAXException {
         // Using qualified name because we are not using xmlns prefixes here.
         if (qName.equals ("title")) {
            title = true;
         }
      }
   
      @Override
      public void endElement (String namespaceURI, String localName, String qName)
         throws SAXException {
         // End of processing current element
         if (title) {
title = false;
         }
      }
   The
      @Override
      public void characters (char [] ch, int start, int length) {
         // Processing character data inside an element
         if (title) {
            String bookTitle = new String (ch, start, length);
            System.out.println ("Book title:" + bookTitle);
            nameList.add (bookTitle);
         }
      }
The
   }
The
   public static void main (String [] args) throws SAXException, IOException {
      XMLReader parser = XMLReaderFactory.createXMLReader ();
      BookHandler bookHandler = (new SAXParser ()). New BookHandler ();
      parser.setContentHandler (bookHandler);
      parser.parse ("books.xml");
      System.out.println (bookHandler.getNameList ());
   }
 }
The SAX parser interface and event handler interface are defined in the org.xml.sax package. The main interfaces include ContentHandler, DTDHandler, EntityResolver and ErrorHandler. ContentHandler is the main processor interface for handling basic document resolution events; DTDHandler and EntityResolver interfaces are used for handling events related to DTD verification and entity resolution; ErrorHandler is the basic error handling interface. The DefaultHandler class implements the above four event processing interfaces. In the above example, BookHandler inherits the DefaultHandler class and covers five callback methods, startDocument (), endDocument (), startElement (), endElement (), and characters () to add its own event processing logic.

Digester parsing XML

In order to meet the special needs of converting XML to JavaBean, a tool called Digester under Apache provides us with such a choice. Since XML is ultimately converted into JavaBean and stored in memory, the parsing performance and other aspects are not really related to users. The key to parsing lies in the pattern and rules used to match XML. Because the tool is more complex and limited in space, the author can only give a brief introduction.

Here is an example fragment of Digester parsing XML:

Listing 4. Digester parsing XML
 // Define the path of the XML to be parsed and initialize the tool class
 File input = new File ("books.xml");
 Digester digester = new Digester ();

 // If you encounter the <books> tag, you should initialize the JavaBean test.myBean.Books and fill in the relevant content
 digester.addObjectCreate ("books", "test.myBean.Books");
 digester.addSetProperties ("books");
 // If you encounter the <books / book> tag, initialize the JavaBean test.myBean.Book as above
 digester.addObjectCreate ("books / book", "test.myBean.Book");
 digester.addSetProperties ("books / book");
 // Add multiple <books / book> to a collection by calling the addBook () method of the initialized JavaBean above
 digester.addSetNext ("books / book", "addBook", "test.myBean.Book");

 // After defining the above parsing rules, you can start parsing
 Books books = (Books) digester.parse (input);
The above code simply shows the reader some of the main points of Digester's processing of XML, mainly to explain the matching of some patterns and rules. In short, Digester is a JavaBean used to transform an XML into a structure similar to the XML. You can think of the XML root element as a JavaBean. The attribute of the root element is the various fields of the JavaBean. When the root element has other child tags, think of the child tag as a new XML. Treated as a new JavaBean and added to the parent Bean as a Field, and so on, the entire XML is parsed in a looping manner.

 

【For collection】-Do not reprint the three mainstream technologies and introduction of Java processing XML

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.