(IBM) XPath API in Java

Last Update:2018-12-05 Source: Internet

Author: User

Tags xpath functions

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

XPath expressions are much easier to write than tedious Document Object Model (DOM) Navigation Code. To extract information from an XML document, the quickest and easiest way is to embed an XPATH expression in a Java program. Java 5 introduces the javax. xml. XPath package, which is a library used for xpath document query independent of the XML object model.

What do you do if you want to tell someone to buy a gallon of milk? "Please buy a gallon of milk back" or "go out from the front door, turn left, turn right three blocks, and then half a block to turn right into the store. Go to Channel 4, take the Channel Five meters to the left, take a bottle of one gallon of milk and pay at the cashier. Then go home along the original path ." It's ridiculous. As long as you give a little instruction on "please buy a gallon of milk", most adults can buy their own milk.

The query language and computer search are similar. Simply speaking, finding a copy of cryptonomicon is much easier than writing detailed logic for searching a database. Because the search operation logic is very similar, you can create a common language that allows you to use commands such as "find all the books of Neal Stephen enson" and then write an engine that executes such queries on specific data storage.

Xpath

Among the many query languages, Structured Query Language (SQL) is a language designed and optimized for querying specific types of relational databases. Other less common query languages include oql and XQuery. However, the subject of this article is XPath, a query language designed to query XML documents. For example, in the following simple XPath query, you can find the title of all the books whose author is Neal Stephen son in the document:

//book[author="Neal Stephenson"]/title

For comparison, the pure Dom search code for querying the same information is as follows:Listing 1As shown in:

Listing 1. Find the DOM code for all the title elements of Neal Stephen son.

        ArrayList result = new ArrayList();        NodeList books = doc.getElementsByTagName("book");        for (int i = 0; i < books.getLength(); i++) {            Element book = (Element) books.item(i);            NodeList authors = book.getElementsByTagName("author");            boolean stephenson = false;            for (int j = 0; j < authors.getLength(); j++) {                Element author = (Element) authors.item(j);                NodeList children = author.getChildNodes();                StringBuffer sb = new StringBuffer();                for (int k = 0; k < children.getLength(); k++) {                    Node child = children.item(k);                    // really should to do this recursively                    if (child.getNodeType() == Node.TEXT_NODE) {                        sb.append(child.getNodeValue());                    }                }                if (sb.toString().equals("Neal Stephenson")) {                    stephenson = true;                    break;                }            }            if (stephenson) {                NodeList titles = book.getElementsByTagName("title");                for (int j = 0; j < titles.getLength(); j++) {                    result.add(titles.item(j));                }            }        }

Whether you believe it or not,Listing 1The DOM in is obviously not as common or robust as a simple XPath expression. Which one are you willing to write, debug, and maintain? I think the answer is obvious.

But although it has a strong expression ability, XPath is not a Java language. In fact, XPath is not a complete programming language. There are many things that cannot be expressed using XPath, and even some queries cannot be expressed. For example, XPath cannot find all books whose international standard library code (ISBN) Check Code does not match, or find out the overseas account database that shows all the authors who owe the account. Fortunately, we can combine XPath into Java programs so that we can take advantage of the advantages of the two: What Java is good at and what XPath is good.

Until recently, the application programming interfaces (APIS) required for executing XPath queries by Java programs are also different for various XPath engines. Xalan has one API, Saxon uses the other, and other engines use the other API. This means that the Code usually limits you to a product. Ideally, it is better to experiment with various engines with different performance characteristics without causing any inappropriate troubles or re-coding.

Therefore, Java 5 Launchedjavax.xml.xpathPackage provides an XPATH library independent of the engine and object model. This package can also be used in Java 1.3 and later versions, but you need to install Java API for XML Processing (JAXP) 1.3 separately. Xalan 2.7, Saxon 8, and other products include the implementation of this library.

Back to Top

A simple example

I will illustrate how to use it. Then we will discuss some details. Suppose you want to query a list of books and find the books written by Neal Stephen son. Specifically, the form of this book list is as follows:Listing 2As shown in:

List 2. XML documents containing Library Information

<inventory>    <book year="2000">        <title>Snow Crash</title>        <author>Neal Stephenson</author>        <publisher>Spectra</publisher>        <isbn>0553380958</isbn>        <price>14.95</price>    </book>     <book year="2005">        <title>Burning Tower</title>        <author>Larry Niven</author>        <author>Jerry Pournelle</author>        <publisher>Pocket</publisher>        <isbn>0743416910</isbn>        <price>5.99</price>    <book>     <book year="1995">        <title>Zodiac</title>        <author>Neal Stephenson<author>        <publisher>Spectra</publisher>        <isbn>0553573862</isbn>        <price>7.50</price>    <book>    <!-- more books... --> </inventory>

Abstract Factory

XPathFactoryIs an abstract factory. Abstract Factory design patterns allow this API to support different object models, such as Dom, JDOM, and XOM. To select different modelsXPathFactory.newInstance()Method to pass the Uniform Resource Identifier (URI) that identifies the object model ). For example, http://xom.nu/can select XOM. However, Dom is the only object model supported by this API so far.

Searching for xpath for all books is very simple://book[author="Neal Stephenson"]. To find the titles of these books, just add one step and the expression becomes//book[author="Neal Stephenson"]/title. Finally, what is really needed istitleThe text node child of the element. This requires an additional step. The complete expression is//book[author="Neal Stephenson"]/title/text().

Now I provide a simple program that executes this query from the Java language and prints the titles of all the books found. First, you need to load the document to a DOMDocumentObject. For simplicity, assume that the document is in the books. xml file of the current working directory. The following simple code snippets parse the document and create the correspondingDocumentObject:

Listing 3. parsing documents with JAXP

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();        factory.setNamespaceAware(true); // never forget this!        DocumentBuilder builder = factory.newDocumentBuilder();        Document doc = builder.parse("books.xml");

So far, this is just the standard JAXP and Dom, nothing new.

CreateXPathFactory:

XPathFactory factory = XPathFactory.newInstance();

Then use this factory to createXPathObject:

XPath xpath = factory.newXPath();

XPathObject compiling XPath expressions:

PathExpression expr = xpath.compile("//book[author='Neal Stephenson']/title/text()");

Evaluate directly

If the XPath expression is used only once, skip the compilation step to directlyXPathObject callevaluate()Method. However, if the same expression needs to be used multiple times, compilation may be faster.

Finally, calculate the XPath expression to get the result. Expressions are calculated for specific context nodes. In this example, the entire document is used. The return type must also be specified. A node set must be returned:

Object result = expr.evaluate(doc, XPathConstants.NODESET);

The result can be forcibly converted to Dom.NodeListAnd then traverse the list to get all titles:

        NodeList nodes = (NodeList) result;        for (int i = 0; i < nodes.getLength(); i++) {            System.out.println(nodes.item(i).getNodeValue());         }

Listing 4Combine the above fragments into a program. Note that these methods may throw some check exceptions, which must bethrowsClause, But I covered them up above:

Listing 4. Using a fixed XPath expression to query the complete program of an XML document

import java.io.IOException;import org.w3c.dom.*;import org.xml.sax.SAXException;import javax.xml.parsers.*;import javax.xml.xpath.*;public class XPathExample {  public static void main(String[] args)    throws ParserConfigurationException, SAXException,           IOException, XPathExpressionException {    DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();    domFactory.setNamespaceAware(true); // never forget this!    DocumentBuilder builder = domFactory.newDocumentBuilder();    Document doc = builder.parse("books.xml");    XPathFactory factory = XPathFactory.newInstance();    XPath xpath = factory.newXPath();    XPathExpression expr      = xpath.compile("//book[author='Neal Stephenson']/title/text()");    Object result = expr.evaluate(doc, XPathConstants.NODESET);    NodeList nodes = (NodeList) result;    for (int i = 0; i < nodes.getLength(); i++) {        System.out.println(nodes.item(i).getNodeValue());     }  }}

XPath Data Model

When you mix two different languages, such as XPath and Java, there must be some obvious seams that bond the two. Not everything is perfect. XPath and Java do not have the same type system. XPath 1.0 has only four basic data types:

Node-Set
Number
Boolean
String

Of course, the Java language has more data types, including user-defined object types.

Most XPath expressions, especially location paths, return node sets. But there are other possibilities. For example, an XPATH expressioncount(//book)The number of books in the returned document. XPath expressionscount(//book[@author="Neal Stephenson"]) > 10Returns a Boolean value: if there are more than 10 books written by Neal Stevenson in the document, true is returned; otherwise, false is returned.

evaluate()Method declared as returnObject. What is actually returned depends on the result of the XPath expression and the required type. Generally

Number ing isjava.lang.Double
String ingjava.lang.String
Boolean ingjava.lang.Boolean
Node-set ing isorg.w3c.dom.NodeList

XPath 2

We have always assumed that you are using XPath 1.0. XPath 2 greatly extends and modifies the type system. The main modification required by the Java XPath API to support XPath 2 is to add a constant for returning the new data type of xpath 2.

When calculating an XPATH expression in Java, the second parameter specifies the expected return type. There are five possibilities:javax.xml.xpath.XPathConstantsClass named constants:

XPathConstants.NODESET
XPathConstants.BOOLEAN
XPathConstants.NUMBER
XPathConstants.STRING
XPathConstants.NODE

LastXPathConstants.NODEActually, there is no matching XPath type. Only when the XPath expression returns only one node or only one node is required. If the XPath expression returns multiple nodes and specifiesXPathConstants.NODE, Thenevaluate()Return the first node in the order of documents. If the XPath expression selects an empty set and specifiesXPathConstants.NODE, Thenevaluate()Returns null.

If the required conversion cannot be completed,evaluate()Will throwXPathException.

Back to Top

Namespace Context

If the elements in the XML document are in the namespace, The XPath expressions used to query the document must use the same namespace. The XPath expression does not have to use the same prefix. You only need to use the same namespace URI. In fact, if the XML document uses the default namespace, The XPath expression must also use the prefix even though the target document does not use the prefix.

However, Java programs are not XML documents and therefore cannot be parsed using common namespaces. You must provide an object to map the prefix to the namespace URI. This object isjavax.xml.namespace.NamespaceContextInterface instance. For example, assume that the book document is placed in the http://www.example.com/books namespace, suchListing 5As shown in:

Listing 5. XML document using the default namespace

<inventory xmlns="http://www.example.com/books">    <book year="2000">        <title>Snow Crash</title>        <author>Neal Stephenson</author>        <publisher>Spectra</publisher>        <isbn>0553380958</isbn>        <price>14.95<price>    </book>    <!-- more books... --><inventory>

The XPath expression used to search for the titles of all the books of Neal Stephen son should be changed//pre:book[pre:author="Neal Stephenson"]/pre:title/text(). However, the prefix must bepreIng to Uri http://www.example.com/books.NamespaceContextThe interface does not have default implementation in the Java software development toolkit (JDK) or JAXP. It seems a bit stupid, but it does. However, it is not difficult to implement it by yourself.Listing 6A simple implementation of a namespace is provided. You also need to mapxmlPrefix.

Listing 6. Bind a simple context of a namespace and a default namespace

import java.util.Iterator;import javax.xml.*;import javax.xml.namespace.NamespaceContext;public class PersonalNamespaceContext implements NamespaceContext {    public String getNamespaceURI(String prefix) {        if (prefix == null) throw new NullPointerException("Null prefix");        else if ("pre".equals(prefix)) return "http://www.example.org/books";        else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI;        return XMLConstants.NULL_NS_URI;    }    // This method isn't necessary for XPath processing.    public String getPrefix(String uri) {        throw new UnsupportedOperationException();    }    // This method isn't necessary for XPath processing either.    public Iterator getPrefixes(String uri) {        throw new UnsupportedOperationException();    }}

It is not difficult to reuse the namespace context by using the ing storage binding and adding the setter method.

CreateNamespaceContextAfter the object is compiled, install itXPathObject. In the future, you can use these prefixes for queries as before. For example:

Listing 7. querying using namespaces using xpath

  XPathFactory factory = XPathFactory.newInstance();  XPath xpath = factory.newXPath();  xpath.setNamespaceContext(new PersonalNamespaceContext());  XPathExpression expr     = xpath.compile("//pre:book[pre:author='Neal Stephenson']/pre:title/text()");  Object result = expr.evaluate(doc, XPathConstants.NODESET);  NodeList nodes = (NodeList) result;  for (int i = 0; i < nodes.getLength(); i++) {      System.out.println(nodes.item(i).getNodeValue());   }

Back to Top

Function Solver

Sometimes it is useful to define extension functions for XPath expressions in Java. These functions can execute tasks that are difficult or unable to be executed using pure XPath. But it must be a real function, not a random method. That is to say, there is no side effect. (XPath functions can be evaluated multiple times in any order .)

Extensions accessed through Java XPath APIs must be implementedjavax.xml.xpath.XPathFunctionInterface. This interface only declares one Method Evaluate:

public Object evaluate(List args) throws XPathFunctionException

This method must return one of the five types that can be converted to XPath by Java:

String
Double
Boolean
Nodelist
Node

For example,Listing 8Shows an extension function that checks the ISBN checksum and returnsBoolean. The basic rule of This checksum is that each of the first nine digits is multiplied by its position (that is, the first digit is multiplied by 1, the second digit is multiplied by 2, and so on ). Add these numbers and divide them by the remainder of 11. If the remainder is 10, the last digit is X.

Listing 8. Check the XPath extension functions of ISBN.

import java.util.List;import javax.xml.xpath.*;import org.w3c.dom.*;public class ISBNValidator implements XPathFunction {  // This class could easily be implemented as a Singleton.      public Object evaluate(List args) throws XPathFunctionException {    if (args.size() != 1) {      throw new XPathFunctionException("Wrong number of arguments to valid-isbn()");    }    String isbn;    Object o = args.get(0);    // perform conversions    if (o instanceof String) isbn = (String) args.get(0);    else if (o instanceof Boolean) isbn = o.toString();    else if (o instanceof Double) isbn = o.toString();    else if (o instanceof NodeList) {        NodeList list = (NodeList) o;        Node node = list.item(0);        // getTextContent is available in Java 5 and DOM 3.        // In Java 1.4 and DOM 2, you'd need to recursively         // accumulate the content.        isbn= node.getTextContent();    }    else {        throw new XPathFunctionException("Could not convert argument type");    }    char[] data = isbn.toCharArray();    if (data.length != 10) return Boolean.FALSE;    int checksum = 0;    for (int i = 0; i < 9; i++) {        checksum += (i+1) * (data[i]-'0');    }    int checkdigit = checksum % 11;    if (checkdigit + '0' == data[9] || (data[9] == 'X' && checkdigit == 10)) {        return Boolean.TRUE;    }    return Boolean.FALSE;  }}

Next, let the extension function be used in Java programs. Therefore, you need to installjavax.xml.xpath.XPathFunctionResolver. The function solver maps the XPath name and namespace URI of the function to the Java class that implements the function.Listing 9Is a simple function solver that extends functions.valid-isbnAnd namespace http://www.example.org/books ingListing 8. For example, an XPATH expression//book[not(pre:valid-isbn(isbn))]You can find all books whose ISBN checksum does not match.

Listing 9. Identifying the context of the valid-ISBN Extension function

iimport javax.xml.namespace.QName;import javax.xml.xpath.*;public class ISBNFunctionContext implements XPathFunctionResolver {  private static final QName name    = new QName("http://www.example.org/books", "valid-isbn");  public XPathFunction resolveFunction(QName name, int arity) {      if (name.equals(ISBNFunctionContext.name) && arity == 1) {          return new ISBNValidator();      }      return null;  }}

Because the extension function must have a namespace, you must useNamespaceResolverEven if the query document does not use any namespace. BecauseXPathFunctionResolver,XPathFunctionAndNamespaceResolverThey are all interfaces. If they are convenient, they can be placed in all classes.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More