XPath expressions are much easier to write than tedious Document Object Model (DOM) Navigation Code. To extract information from an XML document, the quickest and easiest way is to embed an XPATH expression in a Java program. Java 5 introduces the javax. xml. XPath package, which is a library used for xpath document query independent of the XML object model.
What do you do if you want to tell someone to buy a gallon of milk? "Please buy a gallon of milk back" or "go out from the front door, turn left, turn right three blocks, and then half a block to turn right into the store. Go to Channel 4, take the Channel Five meters to the left, take a bottle of one gallon of milk and pay at the cashier. Then go home along the original path ." It's ridiculous. As long as you give a little instruction on "please buy a gallon of milk", most adults can buy their own milk.
The query language and computer search are similar. Simply speaking, finding a copy of cryptonomicon is much easier than writing detailed logic for searching a database. Because the search operation logic is very similar, you can create a common language that allows you to use commands such as "find all the books of Neal Stephen enson" and then write an engine that executes such queries on specific data storage.
Xpath
Among the many query languages, Structured Query Language (SQL) is a language designed and optimized for querying specific types of relational databases. Other less common query languages include oql and XQuery. However, the subject of this article is XPath, a query language designed to query XML documents. For example, in the following simple XPath query, you can find the title of all the books whose author is Neal Stephen son in the document:
//book[author="Neal Stephenson"]/title |
For comparison, the pure Dom search code for querying the same information is as follows:Listing 1As shown in:
Listing 1. Find the DOM code for all the title elements of Neal Stephen son.
ArrayList result = new ArrayList(); NodeList books = doc.getElementsByTagName("book"); for (int i = 0; i < books.getLength(); i++) { Element book = (Element) books.item(i); NodeList authors = book.getElementsByTagName("author"); boolean stephenson = false; for (int j = 0; j < authors.getLength(); j++) { Element author = (Element) authors.item(j); NodeList children = author.getChildNodes(); StringBuffer sb = new StringBuffer(); for (int k = 0; k < children.getLength(); k++) { Node child = children.item(k); // really should to do this recursively if (child.getNodeType() == Node.TEXT_NODE) { sb.append(child.getNodeValue()); } } if (sb.toString().equals("Neal Stephenson")) { stephenson = true; break; } } if (stephenson) { NodeList titles = book.getElementsByTagName("title"); for (int j = 0; j < titles.getLength(); j++) { result.add(titles.item(j)); } } } |
Whether you believe it or not,Listing 1The DOM in is obviously not as common or robust as a simple XPath expression. Which one are you willing to write, debug, and maintain? I think the answer is obvious.
But although it has a strong expression ability, XPath is not a Java language. In fact, XPath is not a complete programming language. There are many things that cannot be expressed using XPath, and even some queries cannot be expressed. For example, XPath cannot find all books whose international standard library code (ISBN) Check Code does not match, or find out the overseas account database that shows all the authors who owe the account. Fortunately, we can combine XPath into Java programs so that we can take advantage of the advantages of the two: What Java is good at and what XPath is good.
Until recently, the application programming interfaces (APIS) required for executing XPath queries by Java programs are also different for various XPath engines. Xalan has one API, Saxon uses the other, and other engines use the other API. This means that the Code usually limits you to a product. Ideally, it is better to experiment with various engines with different performance characteristics without causing any inappropriate troubles or re-coding.
Therefore, Java 5 Launchedjavax.xml.xpath
Package provides an XPATH library independent of the engine and object model. This package can also be used in Java 1.3 and later versions, but you need to install Java API for XML Processing (JAXP) 1.3 separately. Xalan 2.7, Saxon 8, and other products include the implementation of this library.
A simple example
I will illustrate how to use it. Then we will discuss some details. Suppose you want to query a list of books and find the books written by Neal Stephen son. Specifically, the form of this book list is as follows:Listing 2As shown in:
List 2. XML documents containing Library Information
<inventory> <book year="2000"> <title>Snow Crash</title> <author>Neal Stephenson</author> <publisher>Spectra</publisher> <isbn>0553380958</isbn> <price>14.95</price> </book> <book year="2005"> <title>Burning Tower</title> <author>Larry Niven</author> <author>Jerry Pournelle</author> <publisher>Pocket</publisher> <isbn>0743416910</isbn> <price>5.99</price> <book> <book year="1995"> <title>Zodiac</title> <author>Neal Stephenson<author> <publisher>Spectra</publisher> <isbn>0553573862</isbn> <price>7.50</price> <book> <!-- more books... --> </inventory> |
|
Abstract Factory XPathFactory Is an abstract factory. Abstract Factory design patterns allow this API to support different object models, such as Dom, JDOM, and XOM. To select different modelsXPathFactory.newInstance() Method to pass the Uniform Resource Identifier (URI) that identifies the object model ). For example, http://xom.nu/can select XOM. However, Dom is the only object model supported by this API so far.
|
|
Searching for xpath for all books is very simple://book[author="Neal Stephenson"]
. To find the titles of these books, just add one step and the expression becomes//book[author="Neal Stephenson"]/title
. Finally, what is really needed istitle
The text node child of the element. This requires an additional step. The complete expression is//book[author="Neal Stephenson"]/title/text()
.
Now I provide a simple program that executes this query from the Java language and prints the titles of all the books found. First, you need to load the document to a DOMDocument
Object. For simplicity, assume that the document is in the books. xml file of the current working directory. The following simple code snippets parse the document and create the correspondingDocument
Object:
Listing 3. parsing documents with JAXP
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); // never forget this! DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse("books.xml"); |
So far, this is just the standard JAXP and Dom, nothing new.
CreateXPathFactory
:
XPathFactory factory = XPathFactory.newInstance(); |
Then use this factory to createXPath
Object:
XPath xpath = factory.newXPath(); |
XPath
Object compiling XPath expressions:
PathExpression expr = xpath.compile("//book[author='Neal Stephenson']/title/text()"); |
|
Evaluate directly If the XPath expression is used only once, skip the compilation step to directlyXPath Object callevaluate() Method. However, if the same expression needs to be used multiple times, compilation may be faster. |
|
Finally, calculate the XPath expression to get the result. Expressions are calculated for specific context nodes. In this example, the entire document is used. The return type must also be specified. A node set must be returned:
Object result = expr.evaluate(doc, XPathConstants.NODESET); |
The result can be forcibly converted to Dom.NodeList
And then traverse the list to get all titles:
NodeList nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) { System.out.println(nodes.item(i).getNodeValue()); } |
Listing 4Combine the above fragments into a program. Note that these methods may throw some check exceptions, which must bethrows
Clause, But I covered them up above:
Listing 4. Using a fixed XPath expression to query the complete program of an XML document
import java.io.IOException;import org.w3c.dom.*;import org.xml.sax.SAXException;import javax.xml.parsers.*;import javax.xml.xpath.*;public class XPathExample { public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException { DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance(); domFactory.setNamespaceAware(true); // never forget this! DocumentBuilder builder = domFactory.newDocumentBuilder(); Document doc = builder.parse("books.xml"); XPathFactory factory = XPathFactory.newInstance(); XPath xpath = factory.newXPath(); XPathExpression expr = xpath.compile("//book[author='Neal Stephenson']/title/text()"); Object result = expr.evaluate(doc, XPathConstants.NODESET); NodeList nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) { System.out.println(nodes.item(i).getNodeValue()); } }} |
XPath Data Model
When you mix two different languages, such as XPath and Java, there must be some obvious seams that bond the two. Not everything is perfect. XPath and Java do not have the same type system. XPath 1.0 has only four basic data types:
- Node-Set
- Number
- Boolean
- String
Of course, the Java language has more data types, including user-defined object types.
Most XPath expressions, especially location paths, return node sets. But there are other possibilities. For example, an XPATH expressioncount(//book)
The number of books in the returned document. XPath expressionscount(//book[@author="Neal Stephenson"]) > 10
Returns a Boolean value: if there are more than 10 books written by Neal Stevenson in the document, true is returned; otherwise, false is returned.
evaluate()
Method declared as returnObject
. What is actually returned depends on the result of the XPath expression and the required type. Generally
- Number ing is
java.lang.Double
- String ing
java.lang.String
- Boolean ing
java.lang.Boolean
- Node-set ing is
org.w3c.dom.NodeList
|
XPath 2 We have always assumed that you are using XPath 1.0. XPath 2 greatly extends and modifies the type system. The main modification required by the Java XPath API to support XPath 2 is to add a constant for returning the new data type of xpath 2. |
|
When calculating an XPATH expression in Java, the second parameter specifies the expected return type. There are five possibilities:javax.xml.xpath.XPathConstants
Class named constants:
XPathConstants.NODESET
XPathConstants.BOOLEAN
XPathConstants.NUMBER
XPathConstants.STRING
XPathConstants.NODE
LastXPathConstants.NODE
Actually, there is no matching XPath type. Only when the XPath expression returns only one node or only one node is required. If the XPath expression returns multiple nodes and specifiesXPathConstants.NODE
, Thenevaluate()
Return the first node in the order of documents. If the XPath expression selects an empty set and specifiesXPathConstants.NODE
, Thenevaluate()
Returns null.
If the required conversion cannot be completed,evaluate()
Will throwXPathException
.
Namespace Context
If the elements in the XML document are in the namespace, The XPath expressions used to query the document must use the same namespace. The XPath expression does not have to use the same prefix. You only need to use the same namespace URI. In fact, if the XML document uses the default namespace, The XPath expression must also use the prefix even though the target document does not use the prefix.
However, Java programs are not XML documents and therefore cannot be parsed using common namespaces. You must provide an object to map the prefix to the namespace URI. This object isjavax.xml.namespace.NamespaceContext
Interface instance. For example, assume that the book document is placed in the http://www.example.com/books namespace, suchListing 5As shown in:
Listing 5. XML document using the default namespace
<inventory xmlns="http://www.example.com/books"> <book year="2000"> <title>Snow Crash</title> <author>Neal Stephenson</author> <publisher>Spectra</publisher> <isbn>0553380958</isbn> <price>14.95<price> </book> <!-- more books... --><inventory> |
The XPath expression used to search for the titles of all the books of Neal Stephen son should be changed//pre:book[pre:author="Neal Stephenson"]/pre:title/text()
. However, the prefix must bepre
Ing to Uri http://www.example.com/books.NamespaceContext
The interface does not have default implementation in the Java software development toolkit (JDK) or JAXP. It seems a bit stupid, but it does. However, it is not difficult to implement it by yourself.Listing 6A simple implementation of a namespace is provided. You also need to mapxml
Prefix.
Listing 6. Bind a simple context of a namespace and a default namespace
import java.util.Iterator;import javax.xml.*;import javax.xml.namespace.NamespaceContext;public class PersonalNamespaceContext implements NamespaceContext { public String getNamespaceURI(String prefix) { if (prefix == null) throw new NullPointerException("Null prefix"); else if ("pre".equals(prefix)) return "http://www.example.org/books"; else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI; return XMLConstants.NULL_NS_URI; } // This method isn't necessary for XPath processing. public String getPrefix(String uri) { throw new UnsupportedOperationException(); } // This method isn't necessary for XPath processing either. public Iterator getPrefixes(String uri) { throw new UnsupportedOperationException(); }} |
It is not difficult to reuse the namespace context by using the ing storage binding and adding the setter method.
CreateNamespaceContext
After the object is compiled, install itXPath
Object. In the future, you can use these prefixes for queries as before. For example:
Listing 7. querying using namespaces using xpath
XPathFactory factory = XPathFactory.newInstance(); XPath xpath = factory.newXPath(); xpath.setNamespaceContext(new PersonalNamespaceContext()); XPathExpression expr = xpath.compile("//pre:book[pre:author='Neal Stephenson']/pre:title/text()"); Object result = expr.evaluate(doc, XPathConstants.NODESET); NodeList nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) { System.out.println(nodes.item(i).getNodeValue()); } |
Function Solver
Sometimes it is useful to define extension functions for XPath expressions in Java. These functions can execute tasks that are difficult or unable to be executed using pure XPath. But it must be a real function, not a random method. That is to say, there is no side effect. (XPath functions can be evaluated multiple times in any order .)
Extensions accessed through Java XPath APIs must be implementedjavax.xml.xpath.XPathFunction
Interface. This interface only declares one Method Evaluate:
public Object evaluate(List args) throws XPathFunctionException |
This method must return one of the five types that can be converted to XPath by Java:
String
Double
Boolean
Nodelist
Node
For example,Listing 8Shows an extension function that checks the ISBN checksum and returnsBoolean
. The basic rule of This checksum is that each of the first nine digits is multiplied by its position (that is, the first digit is multiplied by 1, the second digit is multiplied by 2, and so on ). Add these numbers and divide them by the remainder of 11. If the remainder is 10, the last digit is X.
Listing 8. Check the XPath extension functions of ISBN.
import java.util.List;import javax.xml.xpath.*;import org.w3c.dom.*;public class ISBNValidator implements XPathFunction { // This class could easily be implemented as a Singleton. public Object evaluate(List args) throws XPathFunctionException { if (args.size() != 1) { throw new XPathFunctionException("Wrong number of arguments to valid-isbn()"); } String isbn; Object o = args.get(0); // perform conversions if (o instanceof String) isbn = (String) args.get(0); else if (o instanceof Boolean) isbn = o.toString(); else if (o instanceof Double) isbn = o.toString(); else if (o instanceof NodeList) { NodeList list = (NodeList) o; Node node = list.item(0); // getTextContent is available in Java 5 and DOM 3. // In Java 1.4 and DOM 2, you'd need to recursively // accumulate the content. isbn= node.getTextContent(); } else { throw new XPathFunctionException("Could not convert argument type"); } char[] data = isbn.toCharArray(); if (data.length != 10) return Boolean.FALSE; int checksum = 0; for (int i = 0; i < 9; i++) { checksum += (i+1) * (data[i]-'0'); } int checkdigit = checksum % 11; if (checkdigit + '0' == data[9] || (data[9] == 'X' && checkdigit == 10)) { return Boolean.TRUE; } return Boolean.FALSE; }} |
Next, let the extension function be used in Java programs. Therefore, you need to installjavax.xml.xpath.XPathFunctionResolver
. The function solver maps the XPath name and namespace URI of the function to the Java class that implements the function.Listing 9Is a simple function solver that extends functions.valid-isbn
And namespace http://www.example.org/books ingListing 8. For example, an XPATH expression//book[not(pre:valid-isbn(isbn))]
You can find all books whose ISBN checksum does not match.
Listing 9. Identifying the context of the valid-ISBN Extension function
iimport javax.xml.namespace.QName;import javax.xml.xpath.*;public class ISBNFunctionContext implements XPathFunctionResolver { private static final QName name = new QName("http://www.example.org/books", "valid-isbn"); public XPathFunction resolveFunction(QName name, int arity) { if (name.equals(ISBNFunctionContext.name) && arity == 1) { return new ISBNValidator(); } return null; }} |
Because the extension function must have a namespace, you must useNamespaceResolver
Even if the query document does not use any namespace. BecauseXPathFunctionResolver
,XPathFunction
AndNamespaceResolver
They are all interfaces. If they are convenient, they can be placed in all classes.