The XPath API for the Java language

Source: Internet
Author: User
Tags xpath xpath functions

If you want to tell someone to buy a gallon of milk, what would you say? "Please go and buy a gallon of milk back" or "go out from the front door, turn left, walk three blocks to the right, walk half a block and turn right into the store." Go to Route fourth, walk five meters to the left, take a gallon of milk and pay at the cashier. Then go home along the road. "It's ridiculous. As long as the "please go buy a gallon of milk back" based on a little indication, most adults can buy their own milk.

Query language and computer search are similar to this. Just saying "find a cryptonomicon copy" is much easier than writing detailed logic to search for a database. Because the logic of the search operation is very similar, you can invent a common language that lets you use "Find all of Neal Stephenson's works" commands, and then write the engine that executes such queries on specific data stores.

Xpath

In many query languages, Structured Query Language (SQL) is a language designed and optimized for querying a particular type of relational library. Other less common query languages are the Object Query Language (OQL) and XQuery. But the topic of this article is XPath, a query language designed to query XML documents. For example, the following simple XPath query can find the title of all the books the author has for Neal Stephenson in the document:

Book[author= "Neal Stephenson"]/title

As a comparison, the pure DOM search code that queries the same information is shown in Listing 1:

Listing 1. Find the DOM code for the title element of Neal Stephenson all works
        ArrayList result = new ArrayList ();        NodeList books = Doc.getelementsbytagname ("book");            for (int i = 0; i < books.getlength (), i++) {element book = (element) books.item (i);            NodeList authors = Book.getelementsbytagname ("author");            Boolean Stephenson = false;                for (int j = 0; J < Authors.getlength (); j + +) {element author = (element) Authors.item (j);                NodeList children = Author.getchildnodes ();                StringBuffer sb = new StringBuffer ();                    for (int k = 0, K < Children.getlength (); k++) {Node child = Children.item (k);                        Really should to does this recursively if (child.getnodetype () = = Node.text_node) {                    Sb.append (Child.getnodevalue ());                   }} if (Sb.tostring (). Equals ("Neal Stephenson")) {Stephenson = true; Break                }} if (Stephenson) {NodeList titles = Book.getelementsbytagname ("title");                for (int j = 0; J < Titles.getlength (); j + +) {Result.add (Titles.item (j)); }            }        }

Whether you believe it or not, the DOM in Listing 1 is obviously not as generic or robust as a simple XPath expression. Which one would you like to write, debug, and maintain? I think the answer is obvious.

But despite its strong expressive power, XPath is not a Java language, in fact XPath is not a complete programming language. There are a lot of things that can't be expressed in XPath, even some queries. For example, XPath cannot find all books that do not match the International Standard Book Coding (ISBN) code, or look for an outbound account database to display all authors who are under-posted. Fortunately, you can combine XPath into a Java program so that you can take advantage of both: Java is good at Java, and XPath does what XPath does best.

Until recently, the application programming interfaces (APIs) required by the Java program to execute XPath queries were different from the various XPath engines. Xalan There is one type of api,saxon that uses another, while other engines use other APIs. This means that code often restricts you to one product. Ideally, it would be better to experiment with various engines with different performance characteristics without causing undue trouble or rewriting the code.

As a result, Java 5 introduces a javax.xml.xpath package that provides an independent XPath library of engine and object models. This package is also available for Java 1.3 and later versions, but Java API for XML processing (JAXP) 1.3 needs to be installed separately. Xalan 2.7 and Saxon 8 and other products include the implementation of this library.

Back to top of page

A simple example

I'll illustrate how to use it. And then we'll discuss some details. Suppose you want to query a list of books and find Neal Stephenson's writings. Specifically, the list of books is in the form shown in Listing 2:

Listing 2. An XML document that contains library information
<inventory>    <book year= "$" >        <title>snow crash</title>        <author>neal stephenson</author>        <publisher>Spectra</publisher>        <ISBN>0553380958</ISBN >        <price>14.95</price>    </book>     <book year= "2005" >        <title> Burning tower</title>        <author>larry niven</author>        <author>jerry Pournelle</ author>        <publisher>Pocket</publisher>        <isbn>0743416910</isbn>        < price>5.99</price>    <book>     <book year= "1995" >        <title>zodiac</title >        <author>neal stephenson<author>        <publisher>Spectra</publisher>        < isbn>0553573862</isbn>        <price>7.50</price>    <book>    <!--more books ...-</inventory>
Abstract Factory

XPathFactoryis an abstract factory. Abstract Factory Design Patterns enable this API to support different object models, such as the DOM, JDOM, and XOM. In order to choose a different model, you need to XPathFactory.newInstance() pass a Uniform Resource Identifier (URI) that identifies the object model to the method. For example, http://xom.nu/can choose XOM. But in fact, so far the DOM is the only object model supported by the API.

Finding XPath queries for all books is simple: //book[author="Neal Stephenson"] . In order to find out the title of these books, one step is added, and the expressions become //book[author="Neal Stephenson"]/title . Finally, what is really needed is the title text node of the element child. This requires one more step, and the complete expression is //book[author="Neal Stephenson"]/title/text() .

Now I provide a simple program that executes the query from the Java language and prints out the titles of all the books found. First, you need to load the document into a DOM Document object. For simplicity, assume that the document is in the Books.xml file in the current working directory. The following simple code snippet parses the document and establishes the corresponding Document object:

Listing 3. Parsing documents with JAXP
        Documentbuilderfactory factory = Documentbuilderfactory.newinstance ();        Factory.setnamespaceaware (TRUE); Never forget this!        Documentbuilder builder = Factory.newdocumentbuilder ();        Document doc = Builder.parse ("books.xml");

So far, this is just the standard JAXP and DOM, nothing new.

Next Create XPathFactory :

Xpathfactory factory = Xpathfactory.newinstance ();

Then use this factory to create XPath objects:

XPath XPath = Factory.newxpath ();

XPathObject-compiled XPath expression:

Pathexpression expr = xpath.compile ("//book[author= ' Neal Stephenson ']/title/text ()");
Direct evaluation

If an XPath expression is used only once, you can skip the compile step to invoke the method directly on the XPath object evaluate() . However, if the same expression is to be reused multiple times, the compilation may be faster.

Finally, an XPath expression is evaluated to get the result. The expression is calculated for a particular context node, in this case the entire document. You must also specify a return type. This requires that a node set be returned:

Object result = Expr.evaluate (doc, Xpathconstants.nodeset);

You can force the result into a DOM NodeList , and then iterate through the list to get all the headings:

        NodeList nodes = (NodeList) result;        for (int i = 0; i < nodes.getlength (); i++) {            System.out.println (Nodes.item (i). Getnodevalue ());         }

Listing 4 combines the above fragments into a single program. Also note that these methods may throw some check exceptions, which must be declared in the throws clauses, but I cover them up:

Listing 4. Query the complete program of an XML document with a fixed XPath expression
import java.io.ioexception;import org.w3c.dom.*;import Org.xml.sax.saxexception;import  Javax.xml.parsers.*;import javax.xml.xpath.*;p ublic class Xpathexample {public static void main (string[] args) throws Parserconfigurationexception, Saxexception, IOException, xpathexpressionexception {DocumentBuilderFactory D    Omfactory = Documentbuilderfactory.newinstance (); Domfactory.setnamespaceaware (TRUE);    Never forget this!    Documentbuilder builder = Domfactory.newdocumentbuilder ();    Document doc = Builder.parse ("books.xml");    Xpathfactory factory = Xpathfactory.newinstance ();    XPath XPath = Factory.newxpath ();    XPathExpression expr = xpath.compile ("//book[author= ' Neal Stephenson ']/title/text ()");    Object result = Expr.evaluate (doc, Xpathconstants.nodeset);    NodeList nodes = (NodeList) result;     for (int i = 0; i < nodes.getlength (); i++) {System.out.println (Nodes.item (i). Getnodevalue ()); }  }}
XPath Data Model

Whenever you mix two different languages, such as XPath and Java, there must be some obvious seams that glue them together. Not everything is in tune. XPath and the Java language do not have the same type system. XPath 1 has only four basic data types:

    • Node-set
    • Number
    • Boolean
    • String

Of course, the Java language has more data types, including user-defined object types.

Most XPath expressions, especially positional paths, return a set of nodes. But there are other possibilities. For example, an XPath expression count(//book) returns the number of books in a document. The XPath expression count(//book[@author="Neal Stephenson"]) > 10 returns a Boolean value: Returns True if the document has more than 10 writings of Neal Stephenson, otherwise false.

evaluate()Method is declared to be returned Object . What is actually returned depends on the result of the XPath expression and the type of requirement. In general, XPath's

    • Number is mapped tojava.lang.Double
    • string is mapped tojava.lang.String
    • The Boolean map isjava.lang.Boolean
    • Node-set Map toorg.w3c.dom.NodeList
XPath 2

The previous assumption is that you are using XPath 1.0. XPath 2 greatly expands and modifies the type system. The primary modification required by the Java XPath API to support XPath 2 is to increment the constants for returning the XPath 2 new data type.

When you evaluate an XPath expression in Java, the second parameter specifies the return type that you want. There are five possible types of constants that are named in the javax.xml.xpath.XPathConstants class:

    • XPathConstants.NODESET
    • XPathConstants.BOOLEAN
    • XPathConstants.NUMBER
    • XPathConstants.STRING
    • XPathConstants.NODE

The last one XPathConstants.NODE does not actually have a matching XPath type. Use it only if you know that the XPath expression returns only one node or only one node is needed. If the XPath expression returns multiple nodes and XPathConstants.NODE is specified, the evaluate() first node is returned in document order. If an XPath expression has an empty set selected and specified XPathConstants.NODE , evaluate() null is returned.

If the required conversion cannot be completed, it evaluate() will be thrown XPathException .

Back to top of page

Namespace context

If an element in an XML document is in a namespace, the XPath expression that queries the document must use the same namespace. XPath expressions do not necessarily use the same prefix, only the namespace URI is required. In fact, if the XML document uses the default namespace, the XPath expression must use a prefix, even though the target document does not use a prefix.

However, the Java program is not an XML document and therefore cannot be parsed with a generic namespace. You must provide an object to map the prefix to the namespace URI. The object is an javax.xml.namespace.NamespaceContext instance of the interface. For example, suppose the book document is placed in the Http://www.example.com/books namespace, as shown in Listing 5:

Listing 5. XML document using the default namespace
<inventory xmlns= "Http://www.example.com/books" >    <book year= "$" >        <title>snow Crash </title>        <author>neal stephenson</author>        <publisher>Spectra</publisher>        <isbn>0553380958</isbn>        <price>14.95<price>    </book>    <!--more Books ...--><inventory>

The XPath expression to find the title of Neal Stephenson all works should be changed //pre:book[pre:author="Neal Stephenson"]/pre:title/text() . However, the prefix must be pre mapped to URI http://www.example.com/books. NamespaceContextinterfaces do not seem a bit silly to have a default implementation in the Java Software Development Toolbox (JDK) or JAXP, but they do. However, it is not difficult to achieve. Listing 6 gives a simple implementation of a namespace. Mapping prefixes are also required xml .

Listing 6. A simple context that binds a namespace and a default namespace
Import Java.util.iterator;import Javax.xml.*;import Javax.xml.namespace.namespacecontext;public class Personalnamespacecontext implements Namespacecontext {public    String Getnamespaceuri (String prefix) {        if ( prefix = = null) throw new NullPointerException ("null prefix");        else if ("Pre". Equals (prefix)) return "Http://www.example.org/books";        else if ("xml". Equals (prefix)) return xmlconstants.xml_ns_uri;        return xmlconstants.null_ns_uri;    }    This method is not necessary for XPath processing.    public string Getprefix (string uri) {        throw new unsupportedoperationexception ();    }    This method isn ' t necessary for XPath processing either.    Public Iterator getprefixes (String uri) {        throw new unsupportedoperationexception ();}    }

It is not difficult to reuse a namespace context using mapped storage bindings and increasing setter methods.

After an object is created NamespaceContext , it is installed on the object before the expression is compiled XPath . You can use these prefixes to query later as before. Like what:

Listing 7. XPath queries that use namespaces
  Xpathfactory factory = Xpathfactory.newinstance ();  XPath XPath = Factory.newxpath ();  Xpath.setnamespacecontext (New Personalnamespacecontext ());  XPathExpression Expr     = xpath.compile ("//pre:book[pre:author= ' Neal Stephenson ']/pre:title/text ()");  Object result = Expr.evaluate (doc, xpathconstants.nodeset);  NodeList nodes = (NodeList) result;  for (int i = 0; i < nodes.getlength (); i++) {      System.out.println (Nodes.item (i). Getnodevalue ());   }

Back to top of page

function Solver

It is sometimes useful to define extension functions for XPath expressions in the Java language. These functions can perform tasks that are difficult or impossible to perform with pure XPath. But it has to be a real function, not a random method. It means there is no side effect. (XPath functions can be evaluated any number of times in any order.) )

An extension function accessed through the Java XPath API must implement an javax.xml.xpath.XPathFunction interface. This interface only declares a method evaluate:

Public Object Evaluate (List args) throws Xpathfunctionexception

The method must return one of the five types of languages that the Java language can convert to XPath:

    • String
    • Double
    • Boolean
    • Nodelist
    • Node

For example, listing 8 shows an extension function that checks the checksum of the ISBN and returns it Boolean . The basic rule of this checksum is that each of the first nine digits is multiplied by its position (that is, the first digit is multiplied by 1, the second digit is multiplied by 2, and so on). Add these numbers together and divide the remainder by 11. If the remainder is 10, then the last digit is X.

Listing 8. Check the XPath extension function of the ISBN
Import java.util.list;import javax.xml.xpath.*;import org.w3c.dom.*;p ublic class Isbnvalidator implements      xpathfunction {//This class could easily is implemented as a Singleton. Public Object Evaluate (List args) throws Xpathfunctionexception {if (args.size ()! = 1) {throw new Xpathfunctione    Xception ("Wrong number of arguments to VALID-ISBN ()");    } String ISBN;    Object o = args.get (0);    Perform conversions if (o instanceof String) ISBN = (string) args.get (0);    else if (o instanceof Boolean) ISBN = o.tostring ();    else if (o instanceof Double) ISBN = o.tostring ();        else if (o instanceof NodeList) {NodeList list = (NodeList) o;        Node node = list.item (0);        Gettextcontent is available in Java 5 and DOM 3.        In Java 1.4 and DOM 2, you ' d need to recursively//accumulate the content.    Isbn= node.gettextcontent ();    } else {throw new xpathfunctionexception ("Could not convert argument type"); } CHar[] data = Isbn.tochararray ();    if (data.length! =) return boolean.false;    int checksum = 0;    for (int i = 0; i < 9; i++) {checksum + = (i+1) * (data[i]-' 0 ');    } int checkdigit = checksum% 11; if (Checkdigit + ' 0 ' = = data[9] | | (data[9] = = ' X ' && checkdigit = = 10))    {return boolean.true;  } return boolean.false; }}

The next step is to make this extension function available in a Java program. To do this, you need to install an XPath object before compiling the expression javax.xml.xpath.XPathFunctionResolver . The function Solver maps the XPath name and namespace URI of a function to the Java class that implements the function. Listing 9 is a simple function solver that maps extension functions and namespace valid-isbn http://www.example.org/books to the classes in Listing 8. For example, an XPath expression //book[not(pre:valid-isbn(isbn))] can find all books that do not match the ISBN checksum.

Listing 9. Identify the context of the VALID-ISBN extension function
Iimport javax.xml.namespace.qname;import javax.xml.xpath.*;p ublic class Isbnfunctioncontext implements Xpathfunctionresolver {  private static final QName name    = new QName ("Http://www.example.org/books", " VALID-ISBN ");  Public xpathfunction resolvefunction (QName name, int arity) {      if (name.equals (isbnfunctioncontext.name) && arity = = 1) {          return new isbnvalidator ();      }      return null;}  }

Because an extension function must have a namespace, it must be used when evaluating an expression that contains an extension function NamespaceResolver , even if the queried document does not use any namespaces. Because XPathFunctionResolver , XPathFunction and NamespaceResolver are interfaces, they can be placed in all classes if they are convenient.

Back to top of page

Conclusion

Writing queries in declarative languages such as SQL and XPath is much easier than using imperative languages such as Java and C. However, it is much easier to write complex logic in a Turing-complete language such as Java and C than in declarative languages such as SQL and XPath. Fortunately, you can combine the two by using Java Database Connectivity (JDBC) and javax.xml.xpath such APIs. As more and more data in the World Turns to XML, it javax.xml.xpath java.sql becomes more and more important.

The XPath API for the Java language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.