Htmlagilitypackage is used in recent development, so keep a record of XPath-related knowledge! Introduction to XPath
XPath is a language that looks for information in an XML document. XPath can be used to traverse elements and attributes in an XML document. XPath is the main element of the XSLT standard, and XQuery and XPointer are built on top of the XPath expression at the same time. Therefore, the understanding of XPath is the foundation of many advanced XML applications. In fact, we are not unfamiliar, the most similar to XPath is the CSS selector. Use CSS selectors in CSS to select elements to apply styles, while in XSLT you use Xpath,xpath as powerful as the CSS selector! Here are some comparisons between CSS selectors and XPath selectors:
CSS selector body p //Select all the P elements below the body body>p//Select child elements of Body p*//Select all elements//corresponding XPath selectors body//pbody/p*
While it is not yet possible to understand the meaning of these XPath expressions, it can be found that it is very much like a CSS selector! But XPath has a much stronger place, such as it can be positioned to p on a specific position under the BODY element or to select the top N p:
Body/p[position () =4]// This XPath expression will pick the 4th P element in the body child element, note that there is a count from 1 body/p[position () <3] // The first two P elements in the body child element are selected
XPath uses a path expression to pick a node or set of nodes in an XML document. These path expressions are very similar to the expressions we see in the regular computer file system. In addition, XPath contains more than 100 built-in functions. These functions are used for string values, values, date and time comparisons, node and QName processing, sequence processing, logical values, and so on.
Writing XPath
XPath uses a path expression to select a node in the XML document. Nodes are selected either along the path or step. such as "/" means the document node, "." Represents the current node, while the ".." Represents the parent node of the current node. Example:
{Because XPath indicates that there is a slash in the delta, so temporarily use this notation!/{Select the document node, NodeType for 9/root {Select the document root element, similar to the file system path (Unix), the path to/begins with the absolute path/root/child/: {Select the child node of the root node, the parent node (that is, root)
Here are some common path expressions
- NodeName Selecting a node with the name NodeName
- /Select from root node
- Select element descendant elements that must be followed by the NodeName
- . Select the current node
- .. Select the parent node of the current node
- @ Select attribute node (@ is an abbreviation for attribute)
<?xml version= "1.0"? ><root><child attr= "attr"/><child><a><desc/></a> </child></root>{The XPath result for the XML document above, the current node is document/root {select Rootroot {select Rootchild {empty, Because child is not a descendant of document//child {Select two children element,//to represent descendants//@attr {Select attr attribute node/root/child//desc {Returns descendant element of child desc
predicate (predicates)
Predicates are used to provide more detailed information when a node is found, and the predicate is embedded in square brackets. Here are some XPath expressions with predicates:
/ROOT/CHILD[3] {Select the third child element of the root element, note that this and the array subscript are different, count from 1//child[@attr] {Select all child elements with attribute attr//child[@attr = "Val"] /DESC {Select all attributes attr the value of the child element of Val Desc//child[desc] {Select all Child//child[position () >3] with desc child elements {position ( ) is a function in XPath that represents the position of the node//child[@attr >12] {XPath expression can also be a numeric comparison, which selects a child element with a attr property value greater than 12//child[last ()] {last () The function returns the last position of the node list, which selects the last child element
Wildcard characters
XPath wildcard characters can be used to select unknown XML elements.
- *, like a selector in CSS, this will match any element node
- @*, matching any attribute node
- Node (), matching nodes of any type
/root/* {Selects all the child elements below the root element/root/node () {Selects all the nodes under the root element, including the text node//* {Selects all elements in the document//child[@*] {selects all children with attributes//@* { Select all attribute Nodes
Combining paths
Like using a comma combination with multiple selectors in CSS, XPath supports one use of the ' | ' To combine the syntax of multiple paths!
/root | /root/child {The root element is selected, and the child element under it Child//child |//desc {Selects all children elements and the DESC element
XPath operator
The following lists the operators that are available in XPath expressions:
- | , compute two node sets
- +, addition
- -, subtraction
- *, multiplication
- Div, Division, because/has been used as a path character, so it cannot be used as a division identifier
- MoD, take surplus
- =, equals
- ! =, not equal to
- <, less than
- <=, less than or equal to
- >, greater than
- >=, greater than or equal to
- Or, or
- And, with
XPath axes
An axis defines a node set that is relative to the current node. The following axis names are available with the corresponding results:
- Ancestor Select all ancestors of the current node (parent, grandfather, etc.)
- Ancestor-or-self selects all ancestors of the current node (parent, grandfather, etc.) and the current node itself
- Attribute selects all properties of the current node
- Child selects all children of the current node.
- Descendant selects all descendant elements (child, grandchild, and so on) of the current node.
- Descendant-or-self selects all descendant elements (child, grandchild, and so on) of the current node and the current node itself.
- Following selects all nodes after the end tag of the current node in the document.
- namespace selects all namespace nodes of the current node
- Parent selects the parents node of the current node.
- Preceding selects all nodes before the start tag of the current node in the document.
- Preceding-sibling selects all sibling nodes before the current node.
- Self selects the current node.
In fact, a complete XPath expression consists of "/" and "step", and the step is composed of "axis", "Node test" and "predicate".
step/step/..... {An XPath expression {Step's constituent axis name:: node test [predicate]
In a generic XPath expression, no predicate is expressed without other conditional restrictions, and without the axis name, child is used by default. As "abc" and "Child::abc" are equivalent, here are some simple XPath expressions equivalent to using the axis name:
- CHILD::ABC---------------------ABC (child element ABC)
- Root/attribute::id------------root/@id (Root property ID)
- Selft::node ()------------------. Own
- Parent::node ()---------------.. (Parent node)
- Child::*------------------------* (sub-Element)
- Child::text ()------------------text () (Subdocument node)
- Descendant::tag------------.//tag (descendant tag Element)
XPath also contains a set of function libraries, such as position and last are functions, general functions are used in predicates, and in XSLT and XQuery they are more widely used.
XPath in the browser
The implementation of IE on XPath is simple. An XML DOM object (and each node) has a selectSingleNode and selectnodes method, passing in an XPath expression, SelectNodes returns a list of matching nodes, And selectSingleNode only returns the first item in the list!
var xmlDom = Getxmldom ();//We previously wrote the cross-browser XML DOM load function Loadxmlfile (XmlDom, "text.xml"); var root = Xmldom.selectsinglenode ("/ * ");//returns the document root element root = Xmldom.selectnodes ("/* ") [0];//as above var lastchild = Xmldom.selectsinglenode ("/*/*[last ()] ");
Mozilla supports XPath based on the DOM standard. DOM Level 3 additional standard DOM levels 3 XPath defines an interface for evaluating XPath expressions in the DOM. Unfortunately, this standard is much more complex than Microsoft's intuitive approach.
Although there are many XPath-related objects, the two most important are: Xpathevaluator and Xpathresult. Xpathevaluator uses the method evaluate () to evaluate an XPath expression.
The Evaluate () method has five parameters: the XPath expression, the context node, the namespace interpreter, and the type of the returned result, and the result in Xpathresult (usually null).
Namespace interpreter, which is necessary only if the XML code is used in the XML namespace, is usually left blank and null. Returns the type of the result, which can be one of the following 10 constant values:
- xpathresult.any_type--returns data that conforms to the XPath expression type
- xpathresult.any_unordered_node_type--returns the node collection of the matching node, but the order may not match the order of the nodes in the document
- xpathresult.boolean_type--Returns a Boolean value
- xpathresult.first_ordered_node_type--returns a collection of nodes that contain only one node, and this node is the first matching node in the document
- xpathresult.number_type--returns a numeric value
- xpathresult.ordered_node_iterator_type--returns a collection of nodes that match nodes in the order in which they appear in the document. This is the most commonly used result type
- xpathresult.ordered_node_snapshot_type--returns a snapshot of the node collection, capturing nodes outside the document so that any future modifications to the document will not affect the node list. The nodes in the node collection are the same as the order they appear in the document
- xpathresult.string_type--returns a String value
- xpathresult.unordered_node_iterator_type--returns the node collection of the matching node, although the order may not be in the order in which the nodes appear in the document
- xpathresult.unordered_node_snapshot_type--returns a snapshot of the node collection, capturing nodes outside the document so that any future modifications to the document will not affect the node list. The nodes in the node collection and the original order in the document are not necessarily the same.
Here's an example of using Ordered_node_iterator_type:
var xmlDom = Getxmldom ();//We previously wrote the cross-browser XML DOM load function Loadxmlfile (XmlDom, "text.xml"); var evaluator = new Xpathevaluator (); var result =evaluator.evaluate ("/root", Xmldom,null,xpathresult.ordered_node_iterator_type,null) var node;if (Result) {///execution failure will return Nullwhile (Node=result.iteratenext ()) {//This list must use the Iteratenext method to traverse alert (node.tagname) ;}}