An XPath path expression
XPath uses a path expression to pick a node or set of nodes in an XML document. These path expressions are very similar to the expressions we see in the regular computer file system. A node is picked up either along a path or a step (steps).
Node
In XPath, there are seven types of nodes: elements, attributes, text, namespaces, processing directives, annotations, and document (root) nodes. The XML document is treated as a node tree. The root of a tree is called a document node or root node.
Take a look at the following XML document:
<?xml version= "1.0" encoding= "iso-8859-1"?><bookstore> <book> <title lang= "en" > Harry potter</title> <author>j K. rowling</author> <year>2005</year> <price>29.99</price>
Examples of nodes in the above XML document:
<bookstore> (document node)
<author>j K. rowling</author> (Element node)
Lang= "en" (Attribute node)
Relationship of nodes
Father, son, fellow, ancestor, descendant, 5 kinds of relations, according to the above XML file book is the parent of the title, title is the child node of book, author and title are compatriots, bookstore is author and title of the Ancestors, Conversely author is the offspring of bookstore.
Select a node
XPath uses a path expression to select a node in the XML document. A node is selected by a path or step. The most useful path expressions are listed below:
An expression |
Description |
NodeName |
Select all child nodes of this node. |
/ |
Select from the root node. |
// |
Selects the nodes in the document from the current node that matches the selection, regardless of their location. |
. |
Select the current node. |
.. |
Selects the parent node of the current node. |
@ |
Select the attribute. |
In the table below, we have listed some path expressions and the results of the expressions:
Path Expression |
Results |
Bookstore |
Selects all child nodes of the bookstore element. |
/bookstore |
Select the root element bookstore. Note: If the path starts with a forward slash (/), this path always represents the absolute path to an element! |
Bookstore/book |
Selects all book elements that belong to a child element of bookstore. |
Book |
Selects all book child elements, regardless of their position in the document. |
Bookstore//book |
Selects all book elements that belong to descendants of the bookstore element, regardless of where they are located under bookstore. |
@lang |
Select all attributes that are named Lang. |
predicate (predicates)
To find a particular node or a node that contains a specified value.
The predicate is embedded in square brackets.
In the table below, we list some path expressions with predicates, as well as the results of expressions:
Path Expression |
Results |
/BOOKSTORE/BOOK[1] |
Selects the first book element that belongs to a bookstore child element. |
/bookstore/book[last ()] |
Select the last book element that belongs to the bookstore child element. |
/bookstore/book[last ()-1] |
Select the second-to-last book element that belongs to the bookstore child element. |
/bookstore/book[position () <3] |
Select the first two book element that belongs to the child element of the bookstore element. |
title[@lang] |
Select all the title elements that have properties named Lang. |
title[@lang = ' Eng '] |
Selects all title elements, and these elements have the lang attribute value of Eng. |
/BOOKSTORE/BOOK[PRICE>35.00] |
Selects all the book elements of the bookstore element, and the value of the price element must be greater than 35.00. |
/bookstore/book[price>35.00]/title |
Selects all the title elements of the book element in the bookstore element, and the value of the price element must be greater than 35.00. |
Select Unknown node
XPath wildcard characters can be used to select unknown XML elements.
wildcard characters |
Description |
* |
Matches any element node. |
@* |
matches any attribute node. |
Node () |
Matches any type of node. |
In the table below, we list some path expressions and the results of these expressions:
Path Expression |
Results |
/bookstore/* |
Selects all child elements of the bookstore element. |
//* |
Selects all elements in the document. |
Title[@*] |
Select all the title elements with attributes. |
Select several paths
By using the ' | ' in a path expression operator, you can select a number of paths.
In the table below, we list some path expressions and the results of these expressions:
Path Expression |
Results |
Book/title | Book/price |
Selects all the title and price elements of the book element. |
Title | Price |
Selects all the title and price elements in the document. |
/bookstore/book/title | Price |
Selects all the title elements of the book element that belong to the bookstore element, and all the price elements in the document. |
Select a node
Unfortunately, Internet Explorer differs from other ways of handling XPath.
In our example, there is code for most major browsers.
Internet Explorer uses the SelectNodes () method to select nodes from an XML document:
Xmldoc.selectnodes (
XPath);
Firefox, Chrome, Opera, and Safari use the Evaluate () method to select a node from an XML document:
Xmldoc.evaluate (
XPath, xmldoc, NULL, xpathresult.any_type,null);
Select All title
The following example selects all the title nodes:
Example/bookstore/book/title
Select the title of the first book
The following example selects the title of the first book node under the bookstore element:
Example/bookstore/book[1]/title
Select All Prices
The following example selects all the text in the price node:
Example/bookstore/book/price/text ()
Select the price node above 35
The following example selects all the price nodes with prices above 35:
Example/bookstore/book[price>35]/price
Select the title node with a price above 35
The following example selects all the title nodes with a price above 35:
Example/bookstore/book[price>35]/title
XPath operator
operator |
Description |
Example |
return value |
| |
Compute two node sets |
Book | Cd |
Returns all node sets that have book and CD elements |
+ |
Addition |
6 + 4 |
10 |
- |
Subtraction |
6-4 |
2 |
* |
Multiplication |
6 * 4 |
24 |
Div |
Division |
8 Div 4 |
2 |
= |
Equals |
price=9.80 |
Returns true if Price is 9.80. If Price is 9.90, False is returned. |
!= |
Not equal to |
price!=9.80 |
Returns true if Price is 9.90. If Price is 9.80, False is returned. |
< |
Less than |
price<9.80 |
Returns true if Price is 9.00. If Price is 9.90, False is returned. |
<= |
Less than or equal to |
price<=9.80 |
Returns true if Price is 9.00. If Price is 9.90, False is returned. |
> |
Greater than |
price>9.80 |
Returns true if Price is 9.90. If Price is 9.80, False is returned. |
>= |
Greater than or equal to |
price>=9.80 |
Returns true if Price is 9.90. If Price is 9.70, False is returned. |
Or |
Or |
price=9.80 or price=9.70 |
Returns true if Price is 9.80. If Price is 9.50, False is returned. |
and |
And |
price>9.00 and price<9.90 |
Returns true if Price is 9.80. If Price is 8.50, False is returned. |
MoD |
Calculate the remainder of a division |
5 MoD 2 |
1 |
Scrapy Crawler Essentials--------XPath learning