Techniques for parsing XML in Java

Source: Internet
Author: User
Tags xpath

Initially, the XML language was only intended to be used as a substitute for HTML language, but as the language evolved and perfected, it was increasingly discovered that it had the advantages of: Markup language extensible, strict syntax rules, meaningful tagging, The advantages of content storage and separation of performance are doomed to the language from the date of birth will be brilliant. The XML language has entered a period of rapid development after becoming a standard, of course, it has its own series of advantages and advantages are doomed to the major technology vendors to its preference, Java as a software industry, a development technology also quickly reacted to the emergence of a variety of XML support Tools, This article will be from this point of view on the Java processing of several mainstream technologies of XML, I hope to help you. In this article, you will get the following information:

    1. What good libraries and tools are available in Java for programmers to handle XML?
    2. With DOM, is it necessary to have other tool libraries?
    3. Several small routines take you quickly to understand these three ways of parsing

What are some of the best libraries and tools in Java that make it easy for programmers to handle XML?

    • The famous DOM
    • Eco-friendly SAX
    • The digester of obscurity
Introduction to three parsing methods of XML

The famous DOM

Say it's famous, but it's not too much, DOM is the standard API for processing XML, which is the basis of many other standards related to XML processing, not only Java, but also other languages such as javascript,php,ms. NET and so on, which has become the most widely used The way XML is processed. Of course, in order to provide more powerful features, Java has a lot of direct DOM extension tools, such as many Java programmers familiar with the jdom,dom4j, and so on, they basically belong to the DOM interface function expansion, preserving a lot of Dom API features, many of the original do M programmers do not even have any obstacles to master the use of the other two, intuitive, easy-to-operate way to make it popular with the vast number of Java programmers.

Eco-friendly SAX

The emergence of Sax has its special needs, why it is green, because Sax uses the least system resources and the fastest parsing method to provide support for XML processing. However, the cumbersome way to find a large number of programmers to bring a lot of trouble, often headache, and its support for the XPath query function, so that people love and hate it.

The JavaBean of Digester:xml in obscurity

Digester is an open source project under the Apache Fund, and the author's understanding of it stems from the research of the Struts framework, and whether there are many programmers who want to design a large open source framework, or even want to write a powerful framework of their own, will encounter such a problem: these various XML Language tag Framework configuration file, what technology is used to analyze the underlying framework? DOM parsing time-consuming, SAX parsing is too cumbersome, and each parsing system overhead will be too large, so, we think of the need to use the XML structure corresponding to the JavaBean to load the information, thus digester came into being. Its appearance for XML conversion to JavaBean object of the need to bring a convenient interface, so that more similar requirements have been more perfect solution, no longer require the programmer to implement this kind of cumbersome parsing program. At the same time, SUN also launched the XML and JavaBean conversion Tool class JAXB, interested readers can learn from their own.

Back to top of page

Comparison of three analytic methods


Advantages and Disadvantages: the implementation of the world standard, there are many programming languages to support this analytic approach, and the method itself is simple and fast operation, very easy for beginners to master. It is handled by reading the XML as a tree-like structure into memory for manipulation and parsing, so that the application can modify the content and structure of the XML data, but at the same time because it needs to read the entire XML file into memory at the beginning of processing, it parses the XML of the large data volume File, you should be aware of the risk of memory leaks and program crashes.

Scope of application: Small XML file parsing, full parsing or most parsing of XML, need to modify XML tree content to generate your own object model


SAX fundamentally solves the resource-intensive problems that DOM produces when parsing XML documents. It is implemented through a stream-like parsing technique that reads through the entire XML document tree and responds to the programmer's need for XML data parsing through an event handler. Since it does not need to read the entire XML document into memory, its savings on system resources are very obvious, and it plays a very important role in some situations where large XML documents need to be handled and the performance requirements are high. SAX, which supports XPath queries, makes it more flexible for developers to work with XML. But at the same time, there are still some shortcomings that plague the vast number of developers: first of all, its very complex API interface is daunting, and secondly because it belongs to a similar stream parsing file scanning method, it does not support the application of XML tree content structure and other modifications, there may be some inconvenience.

Scope of application: Large XML file parsing, only partial parsing or only want to get part of the XML tree content, there is the need for XPath query, there is a need to generate a specific XML tree object model


Pros and Cons: because it is a tool class based on the above two, in order to meet the special needs of converting XML into JavaBean, there is no particular obvious advantages and disadvantages. Digester, the XML parsing tool for the famous open source framework Struts, brings us a reliable way to convert XML to JavaBean.

Scope of application: There is a direct conversion of XML documents to JavaBean requirements.

Back to top of page

Application examples

An XML fragment for parsing is given below:

Listing 1. XML Fragment
<?xml version= "1.0" encoding= "UTF-8"?>  <books>    <book id= "001" >       <title>harry Potter</title>       <author>j K. rowling</author>    </book>    <book id= "002" >       <title>learning xml</title>       <author>erik T. ray</author>    </book>  </books>

DOM parsing XML

Introduction to the DOM interface in Java: The DOM API in the JDK follows the Org.w3c.dom DOM specification, where the package provides interfaces such as Document, DocumentType, Node, NodeList, Element, which are Required to access the DOM document. We can use these interfaces to create, traverse, and modify DOM documents.

The Doumentbuilder and documentbuilderfactory in the Javax.xml.parsers package are used to parse the XML document to generate the corresponding DOM document object.

The Domsource class and Streamsource class in the Javax.xml.transform.dom and packages are used to write the updated DOM document to an XML file.

Here is an example of using DOM to parse XML:

Listing 2. DOM parsing XML
 Import;  Import;  Import Javax.xml.parsers.DocumentBuilder;  Import Javax.xml.parsers.DocumentBuilderFactory;  Import javax.xml.parsers.ParserConfigurationException;  Import org.w3c.dom.Document;  Import org.w3c.dom.Element;  Import Org.w3c.dom.Node;  Import org.w3c.dom.NodeList;  Import org.xml.sax.SAXException;    public class Domparser {documentbuilderfactory builderfactory = documentbuilderfactory.newinstance ();       Load and parse XML file into DOM public document parse (String filePath) {document document = NULL;          Try {//dom Parser instance Documentbuilder Builder = Builderfactory.newdocumentbuilder ();       Parse an XML file into a DOM tree document = Builder.parse (new file (FilePath));        } catch (Parserconfigurationexception e) {e.printstacktrace ();       } catch (Saxexception e) {e.printstacktrace ();       } catch (IOException e) {e.printstacktrace ();   }    return document;          } public static void Main (string[] args) {domparser parser = new Domparser ();          Document document = Parser.parse ("books.xml");          Get root element element rootelement = Document.getdocumentelement ();          Traverse child elements NodeList nodes = Rootelement.getchildnodes ();             for (int i=0; i < nodes.getlength (); i++) {Node node = Nodes.item (i);                if (node.getnodetype () = = Node.element_node) {element child = (element) node;          Process child Element}} NodeList NodeList = Rootelement.getelementsbytagname ("book");                if (nodeList! = null) {for (int i = 0; i < nodelist.getlength (); i++) {                element element = (Element) nodelist.item (i);             String id = element.getattribute ("id"); }          }    }  }

In the example above, Domparser's parse () method is responsible for parsing the XML file and generating the corresponding DOM document object. Where documentbuilderfactory is used to generate a DOM document parser for parsing XML documents. After acquiring the document object corresponding to the XML file, we can invoke a series of APIs to easily access and manipulate the elements in the Document Object model. It is important to note that when the Getchildnodes () method of the Element object is called, it returns all the child nodes under it, including the white space nodes, so the node type needs to be judged before the child element is processed.

It can be seen that DOM parsing XML is easy to develop, only through the parser to build the corresponding DOM tree structure can be easily used API to access and processing nodes, support node deletion and modification. However, Dom parses an XML file to parse the contents of the entire XML file into a tree-like structure that is stored in memory and therefore not suitable for DOM parsing of large XML files.

SAX parsing XML

Unlike DOM, which uses the event model to parse XML documents, it is a faster and lighter way to parse XML documents than to create a tree structure. With SAX, you can selectively parse and access XML documents without having to load the entire document like the DOM, so it has less memory requirements. However, SAX parses an XML document into a one-time read and does not create any document objects, which makes it difficult to access multiple data in the document at once.

Here is an example of a SAX parsing XML:

Listing 3. SAX parsing XML
 Import org.xml.sax.Attributes;  Import org.xml.sax.SAXException;  Import Org.xml.sax.XMLReader;  Import Org.xml.sax.helpers.DefaultHandler;  Import Org.xml.sax.helpers.XMLReaderFactory;       public class SAXParser {class Bookhandler extends DefaultHandler {private list<string> namelist;          Private Boolean title = FALSE;       Public list<string> getnamelist () {return namelist;          }//called at start of a XML document @Override public void Startdocument () throws Saxexception {          System.out.println ("Start parsing document ...");       NameList = new arraylist<string> ();           }//called at end of a XML document @Override public void Enddocument () throws Saxexception {        System.out.println ("End");        }/** * Start processing of an element. * @param namespaceuri Namespace URI * @param localname The local name, without prefix * @param qName the Qualified name, with prefix * @param atts The attributes of the element */@Override public VO ID startelement (String uri, String localname, String qName, Attributes atts) throws Saxexception {//Using          Qualified name because we is not using the xmlns prefixes here.          if (Qname.equals ("title")) {title = true; }} @Override public void EndElement (string NamespaceURI, String localname, string qName) th          Rows Saxexception {//End of processing current element if (title) {title = false; }} @Override public void characters (char[] ch, int start, int length) {//Processing ch             Aracter data inside an element if (title) {String booktitle = new String (CH, start, length);             System.out.println ("book title:" + BookTitle);          Namelist.add (BookTitle); }}} public static void Main (string[] args) throws Saxexception, IOException {XMLReader parser = Xmlreaderfactory.createxmlreader ();       Bookhandler Bookhandler = (new SAXParser ()). New Bookhandler ();       Parser.setcontenthandler (Bookhandler);       Parser.parse ("books.xml");    System.out.println (Bookhandler.getnamelist ()); }  }

The SAX parser interface and the event handler interface are defined in the Org.xml.sax package. The main interfaces include ContentHandler, Dtdhandler, Entityresolver, and ErrorHandler. Where ContentHandler is the primary processor interface for dealing with basic document parsing events; The Dtdhandler and Entityresolver interfaces are used to handle events related to DTD validation and entity resolution; ErrorHandler is the basic error-handling interface. The DefaultHandler class implements the above four event processing interfaces. In the example above, Bookhandler inherits the DefaultHandler class and covers five of the callback methods Startdocument (), Enddocument (), startelement (), EndElement (), and Characters () to join its own event-handling logic.

Digester parsing XML

To meet the special needs of converting XML to JavaBean, an Apache-named Digester tool gives us a choice. Since XML is eventually converted to JavaBean stored in memory, the analytic performance is not really much related to the user. The key to parsing is to match the pattern of XML and rules, etc., because the tool is more complex, confined to space, the author can only give a simple introduction.

Here is an example fragment of Digester parsing XML:

Listing 4. Digester parsing XML
Define the path to the XML to parse, and initialize the tool class file input = new file ("books.xml");  Digester digester = new Digester ();  If you encounter <books> this tag, you should initialize test.myBean.Books this JavaBean and fill in the relevant content digester.addobjectcreate ("books", " Test.myBean.Books ");  Digester.addsetproperties ("books");  If you encounter <books/book> this tag, ditto initialize Test.myBean.Book this JavaBean  digester.addobjectcreate ("Books/book", " Test.myBean.Book ");  Digester.addsetproperties ("Books/book");  Add multiple <books/book> to a collection by calling the Addbook () method of the JavaBean already initialized above Digester.addsetnext ("Books/book", "Addbook", " Test.myBean.Book ");  Once you have defined the above parsing rules, you can begin parsing work Books Books = (Books) digester.parse (input);

The above code simply shows the reader some of the main points of digester processing XML, mainly explaining some patterns and matching rules. In short, digester is a kind of JavaBean that transforms an XML into something similar to that of the XML structure. You can think of the XML root element as a JavaBean, the root element of the attribute is the JavaBean of the various Field, when the root element has other child tag, but also think of this sub-tag as a new XML, as a new JAVAB EAN, and joins the parent Bean as a Field, and so on, parses the entire XML in a circular way.

Techniques for parsing XML in Java

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.