XPath Introduction
XPath is a W3C standard. It is designed to locate nodes in the xml1.0 or xml1.1 document node tree. Currently, xpath1.0 and xpath2.0 are available. Xpath1.0 became the W3C standard in 1999, while xpath2.0 standard was established in 2007. For more information about XPath, see http://www.w3.org/TR/xpath20.
XPath is an expression language. Its return value may be a node, a node set, an atomic value, or a mixture of nodes and Atomic values. Xpath2.0 is the superset of xpath1.0. It is an extension of xpath1.0. It supports a wider range of data types, and xpath2.0 maintains relatively good backward compatibility with xpath1.0, almost all returned results of xpath2.0 can be the same as that of xpath1.0. In addition, xpath2.0 is also the main expression language used to query and locate nodes of ipvt2.0 and xquery1.0. Xquery1.0 is an extension of xpath2.0. The knowledge about using XPath expressions to locate nodes in XSLT and XQuery will be introduced in the following examples.
Before learning XPath, you should perform operations on XML nodes, elements, attributes, Atomic values (text), processing commands, annotations, and root nodes (document nodes ), namespace and the relationship between nodes, such as: parent (parent), child (children), brother (sibling), advanced (ancestor), descendant (descendant) and other concepts have some understanding. It is not described here.
XPath path expression
In the content below this section, you can learn:
- Path expression syntax
- Relative/absolute path
- Expression context
- Concepts of predicates (filter expressions) and axes
- Operators and special characters
- Common expressions
- Functions and descriptions
Here is an instance XML file. The following description and examples are based on the XML file.
Path expression syntax:
- Path = relative path | absolute path
- XPath path expression = Step expression | relative path "/" Step expression.
- Step expression = Axis Node test Predicate
Note:
- The Axis indicates the tree relationship (hierarchy) between the Node Selected by the stepping expression and the current context node. The Node test specifies the node name extension selected by the stepping expression, A predicate is equivalent to a filter expression to further filter the refined node set.
- There can be 0 or more predicates. Multiple predicates are connected by the logical operators "and" or. The logic does not use the not () function.
See a typical XPath query expression:/messages/message // child: node () [@ ID = 0], where/messages/message is the path (the absolute path starts with "/"), Child: is the axis that indicates selection under the subnode, node () is a node test to select all nodes. [@ ID = 0] indicates that all nodes with an attribute ID and a value of 0 are selected.
Relative Path and absolute path:If "/" starts with an XPATH expression, it indicates the root element of the document. (The expression is used as a separator to separate each step expression.) For example,/messages/message/subject is an absolute path representation, it indicates that the node is searched from the document root. If the current node is on the first message node [/messages/message [1], the path expression subject (not "/" before the path) is called a relative path, indicates that the query starts from the current node. For more information, see "expression context ".
Expression context ):Context actually represents an environment. To determine the environment in which the current XPath path expression is executed. For example, the execution results of the same path expression in the environment for root node operation and in the environment for a specific stator node operation may be completely different. That is to say, the calculation result of the XPath path expression depends on its context.
There are basically the following types of xpath context::
- Current node (./):
For example,./sender indicates selecting the sender node set under the current node (equivalent to the "specific element" mentioned below, such as sender)
- Parent node (../):
For example, ../sender indicates selecting the sender node set under the parent node of the current node
- Root element (/):
For example,/messages indicates the set of messages nodes under the document root node.
- Root Node (/*):
* Indicates all nodes, but only one root element exists. Therefore, this indicates the root node. The result returned by/* is the same as that returned by/messages.
- Recursive descent (//):
For example, the current context is a messages node. Then // sender will return the following results:
/Messages // Sender:
<Sender> [email protected] </sender>
<Sender> [email protected] </sender>
<Sender> [email protected] </sender>
/Messages/message [1] // Sender:
<Sender> [email protected] </sender>
<Sender> [email protected] </sender>
We can see that the result returned by the XPath expression is: Recursively search all subnodes under the current node to find the node set that meets the conditions.
- Specific elements
Such as Sender: select the sender node set under the current node, equivalent to (./sender)
Note: Pay attention to the context when executing XPath. That is, the node under which the XPath expression is currently executed. This is important in xmldom. For example, in the xmldom selectnodes, parameters of the selectsinglenode method are all XPath expressions. At this time, the execution context of this XPath expression is the node that calls this method and its environment. For more information, see: http://www.w3.org/TR/xpath20/
Concepts of predicates (filter expressions) and axes: The XPath predicate is a filter expression, similar to the SQL WHERE clause.
Axis name |
Result |
Ancestor |
Select all the ancestors of the current node (parent, grandfather, etc) |
Ancestor-or-self |
Select all the founders of the current node (parent, grandfather, etc.) and the current node itself |
Attribute |
Select all attributes of the current node |
Child |
Select all child elements of the current node. |
Descendant |
Select all descendant elements (child, sun, etc.) of the current node ). |
Descendant-or-self |
Select all descendant elements (child, sun, etc.) of the current node and the current node itself. |
Following |
Select all nodes after the end label of the current node in the document. |
Namespace |
Select All namespace nodes of the current node |
Parent |
Select the parent node of the current node. |
Preceding |
Until all parent nodes of this node are selected, all peer nodes before each parent node are selected in sequence. |
Preceding-sibling |
Select all peer nodes before the current node. |
Self |
Select the current node. |
Operators and special characters:
Operator/Special Character |
Description |
/ |
When this path operator starts with the mode, it indicates that it should be selected from the root node. |
// |
Recursive descent starts from the current node. When this path operator appears at the beginning of the pattern, it indicates that it should be recursively degraded from the root node. |
. |
Current context. |
.. |
The parent level of the current context node. |
* |
Wildcard; select all element nodes that are irrelevant to the element name. (Excluding text, comments, commands, and other nodes. If you want to include these nodes, use the node () function) |
@ |
Prefix of the attribute name. |
@* |
Select all attributes, regardless of the name. |
: |
The namespace separator, which separates the namespace prefix from the element name or attribute name. |
() |
Parentheses operator (with the highest priority), which forces the calculation priority. |
[] |
Apply the filter mode (that is, the predicate, including "filter expression" and "axis (forward/backward )"). |
[] |
Subscript operator; used to compile indexes in a collection. |
| |
Union of two node sets, such as: // messages/message/to | // messages/message/CC |
- |
Subtraction. |
Div, |
Floating Point division. |
And, or |
Logical operation. |
MoD |
Remainder. |
Not () |
Non-logical |
= |
Equal |
! = |
Not equal |
Special comparison Operators |
<Or & lt; <= or & lt ;=> or & gt ;=or & gt ;= the escape form must be used when escaping is required. For example, in XSLT, in the scripting of xmldom, no escape is required. |
Examples of common expressions:
/ |
Document root. |
/* |
Select All element nodes under the root of the document, that is, the root node (the XML document only has one root node) |
/Node () |
All nodes under the root element (including text nodes and comment nodes) |
/Text () |
Search for all text nodes under the document root node |
/Messages/message |
All message nodes under the messages Node |
/Messages/message [1] |
The first message node under the messages Node |
/Messages/message [1]/self: node () |
The first message node (Self axis indicates itself, node () indicates Selecting All nodes) |
/Messages/message [1]/node () |
All subnodes under the first message Node |
/Messages/message [1]/* [last ()] |
The last subnode of the first message Node |
/Messages/message [1]/[last ()] |
Error. The predicate must be a node or node set. |
/Messages/message [1]/node () [last ()] |
The last subnode of the first message Node |
/Messages/message [1]/text () |
All subnodes of the first message Node |
/Messages/message [1] // text () |
Recursive descent of the first message node to search for all text nodes (infinite depth) |
/Messages/message [1]/child: node ()/messages/message [1]/node ()/messages/message [position () = 1]/node () // message [@ ID = 1]/node () |
All subnodes under the first message Node |
// Message [@ ID = 1] // child: node () |
Recursion of all subnodes (infinite depth) |
// Message [position () = 1]/node () |
Select the message node with ID = 1 and the message node with ID = 0 |
/Messages/message [1]/parent ::* |
Messages Node |
/Messages/message [1]/body/Attachments/parent: node ()/messages/message [1]/body/Attachments/parent :: */messages/message [1]/body/Attachments /.. |
The parent node of the attachments node. There is only one parent node, so node () and * return results are the same. (... Also indicates the parent node. indicates the node itself) |
// Message [@ ID = 0]/ancestor ::* |
The ancestor axis indicates all grandparents, fathers, grandfathers, and so on. Recursive upwards |
// Message [@ ID = 0]/ancestor-or-self ::* |
Recursive upwards, including itself |
// Message [@ ID = 0]/ancestor: node () |
Comparison use *. Add one more document root element (document root) |
/Messages/message [1]/descendant: node () // messages/message [1] // node () |
Recursively drop all nodes of the message Node |
/Messages/message [1]/sender/following ::* |
Find all peer nodes after the sender node of the first message node, and recursively look down for each peer node. |
// Message [@ ID = 1]/sender/following-Sibling ::* |
Find all subsequent peer nodes of the sender node of the message node with ID = 1. |
// Message [@ ID = 1]/datetime/@ date |
Search for the date attribute of the datetime node of the message node with ID = 1 |
// Message [@ ID = 1]/datetime [@ date] // message/datetime [attribute: date] |
Find all datetime nodes with the date attribute of the message node whose ID is 1 |
// Message [datetime] |
Find all message nodes with datetime nodes |
// Message/datetime/attribute: * // message/datetime/attribute: node () // message/datetime /@* |
Returns all attribute nodes of the datetime node under the message node. |
// Message/datetime [attribute: *] // message/datetime [attribute: node ()] // message/datetime [@ *] // message/datetime [@ node ()] |
Select All datetime nodes with attributes |
// Attribute ::* |
Select All attribute nodes under the root node |
// Message [@ ID = 0]/body/preceding: node () |
Select all peer nodes before the node where the body node is located in sequence. (The search order is as follows: first find the top-level node (root node) of the body node to obtain all the nodes of the same level before the root node label. After the execution is complete, continue to the next level, obtain all nodes of the same level before the node label in sequence, and so on .) Note: The same-level nodes are searched sequentially instead of recursively. |
// Message [@ ID = 0]/body/preceding-Sibling: node () |
Sequentially query all nodes of the same level before the body tag. (The biggest difference with the previous example is that you do not search layer-by-layer from the top layer to the body node. We can understand that a loop is missing, and only the same-level nodes before the current node are found) |
// Message [@ ID = 1] // * [namespace: Amazon] |
Find all the nodes whose namespace is Amazon under all message nodes whose ID is 1. |
// Namespace ::* |
All namespace nodes in the document. (Including the default namespace xmlns: XML) |
// Message [@ ID = 0] // books/* [local-Name () = 'book'] |
Select All book nodes under books. Note: The book node defines the namespace <amazone: Book>. if it is written as // message [@ ID = 0] // books/book, no node can be found. |
// Message [@ ID = 0] // books/* [local-Name () = 'book' and namespace-Uri () = 'HTTP: // www.amazon.com/books/schema'] |
Select All book nodes under Books (the node name and namespace match) |
// Message [@ ID = 0] // books/* [local-Name () = 'book'] [year> 2006] |
Select a book node with a year value> 2006 |
// Message [@ ID = 0] // books/* [local-Name () = 'book'] [1]/year> 2006 |
Indicates whether the year node value of the first book node is greater than 2006. Returns Xs: Boolean: True |
Functions and descriptions: The XPath functions, XSLT, XQuery, and other shared function libraries provide us with a variety of function calls, and we can also customize our own functions. Here I will not explain the usage of each function one by one, good English friends directly go to look at W3 about the XPath function Introduction: http://www.w3.org/TR/xquery-operators. Chinese can refer to this website, http://www.w3school.com.cn/xpath/xpath_functions.asp
Application of xpath in Dom, XSLT, and XQuery
<! Doctype HTML public "-// W3C // dtd xhtml 1.0 transitional // en" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<HTML xmlns = "http://www.w3.org/1999/xhtml">
<Head>
<Title> XPath test </title>
</Head>
<Body>
<Script language = "JavaScript" type = "text/JavaScript">
VaR xmldoc = new activexobject ("Microsoft. xmldom ");
Xmldoc. async = "false ";
Xmldoc. Load ("messages. xml ");
Xmldoc. setproperty ("selectionlanguage", "XPath ");
VaR Spath = "/messages/message [1] // books/* [local-Name () = 'book']";
VaR booknodes = xmldoc. selectnodes (Spath );
Document. Write ("<ul> ");
For (VAR I = 0; I <booknodes. length; I ++ ){
Document. Write ("<li>" + booknodes [I]. childnodes [0]. Text + "</LI> ");
}
Document. Write ("</ul> ");
</SCRIPT>
</Body>
</Html>
Note:: If new activexobject ("Microsoft. xmldom") is used, note that the selectionlanguage attribute of xmldom is a regular expression by default, not an XPATH language. Therefore, you must specify the xmldoc. setproperty ("selectionlanguage", "XPath") statement to support the XPath query expression .. If the selectionlanguage attribute value is not specified as XPath, pay attention to the following situations:
- Array subscript starts from 0 (we know that the array subscript starts from 1 in the XPath query expression)
- You cannot use the XPath function in an XPATH query expression.
XSLT:
See my other small demo http://www.cnblogs.com/ktgu/archive/2008/12/14/1354890.html on how to use XSLT
XQuery:
XQuery version 1.0 ";
<Ul>
{
Let $ I: = 0
For $ X in DOC ("C: \ Users \ Administrator \ Desktop \ messages. XML ") // message [@ ID = 0] // books/* [local-Name () = 'book']
Where $ X/year> 2006
Order by $ X/year descending
Return <li >{ data ($ X/Name)} </LI>
}
</Ul>
Returned result <ul>
<Li> Microsoft Visual C #2008 step by step </LI>
<Li> professional C #2008 </LI>
</Ul>