Xml
The concept of XPath is introduced in order to accurately locate a node element when matching an XML document structure tree. XPath can be compared to a file management path: Through the file management path, you can find the required files according to certain rules, and also, depending on the rules set by XPath, you can easily find any node in the XML Structure document tree.
However, since XPath can be applied to more than one standard, the consortium has made it independent as a companion standard for XSLT, which is an important part of XSLT and the xpointer we'll talk about later.
Before we introduce the matching rules for XPath, let's look at some basic concepts about XPath.
The first thing to say is the XPath data type. XPath can be divided into four types of data:
- Node Set (node-set)
A node set is a set of nodes that match the criteria returned by a path. Other types of data cannot be converted to node sets.
- Boolean Value (Boolean)
A conditional matching value returned by a function or Boolean expression, the same as a Boolean value in the general language, with true and false two values. Boolean values can be converted to and from numeric types and string types.
- Strings (String)
A string is a collection of a series of characters that provides a series of string functions in XPath. Strings can be converted to data from numeric types, Boolean types, and other values.
- Value (number)
The values in XPath are floating-point numbers, which can be double-precision 64-bit floating-point numbers. It also includes some special descriptions of numerical values, such as Non-numeric nan (not-a-number), positive infinity infinity, negative infinity-infinity, plus or minus 0, and so on. The integer value of number can be obtained by a function, and numeric values can also be converted to and from Boolean types and string types.
The latter three types of data are similar to the corresponding data types in other programming languages, except that the first data type is a unique product of the XML document tree.
In addition, because XPath contains a series of operations on the document tree, it is also necessary to make sure of the XPath node type. Recall the logical structure of the XML document described in chapter II, an XML file can contain elements, CDATA, annotations, processing instructions, and other logical elements, where elements can also contain attributes, and you can use attributes to define namespaces. Accordingly, in XPath, the nodes are divided into seven node types:
- Root node
The root node is the topmost of a tree, and the root node is unique. All other element nodes on the tree are its child nodes or descendant nodes. The processing mechanism for the root node is the same as for other nodes. The matching of trees in XSLT always starts with the root node first.
- Elements node (element Nodes)
ELEMENT nodes correspond to each element in the document, and the child nodes of an element node can be element nodes, annotation nodes, processing instruction nodes, and text nodes. You can define a unique identity ID for the element node.
An element node can have an extension, which is made up of two parts: a namespace URI and a local name.
- Text node (Nodes)
A text node contains a set of character data, which is the character contained in CDATA. No text node has an adjacent sibling text node and the text node does not have an extension.
- Property node (attribute Nodes)
Each element node has an associated set of attribute nodes, which are the parent of each attribute node, but the attribute node is not a child of its parent element. This means that by looking for the child nodes of an element, you can match the attribute nodes of the element, but in turn it is not, just one-way. Again, the attribute nodes of an element are not shared, meaning that different element nodes do not share the same attribute node.
Handling of default properties is equivalent to a defined property. If a property is declared in a DTD, but is declared as #implied, and the property is not defined in the element, the attribute node set of the element does not contain it.
In addition, there are no namespace declarations for attribute nodes that correspond to attributes. A namespace attribute corresponds to a node of another type.
- Namespace node (Namespace Nodes)
Each element node has a set of related namespace nodes. In XML documents, namespaces are declared by retaining attributes, so in XPath, the class nodes are very similar to the property nodes, and their relationship to the parent element is one-way and not shared.
- Processing instruction node (processing instruction Nodes)
The processing instruction node corresponds to each processing instruction in the XML document. It also has an extension, the local name of the extension points to the processing object, and the namespace part is empty.
- Annotation node (Comment Nodes)
Note nodes correspond to the comments in the document.
Next, we'll construct an XML document tree, which is supported by the following examples:
<a id= "A1" > <b id= "B1" > <c id= "C1" > <b name= "B"/> <d id= "D1"/> <e id= "E1"/> <e id= "E2"/> </C> </B> <b id= "B2"/> <c id= "C2" > <B/> <d id= "D2"/> <F/> </C> <E/> </A> |
Now, let's introduce some basic methods of node matching in XPath.
- Path Matching
Path matching is similar to the representation of the file path, which is better understood. There are several symbols:
Symbol |
Meaning |
Example |
Match Results |
/ |
Indicates node path |
/a/c/d |
Child node "D" of child node "C" of Node "a", that is, d node with ID value of D2 |
/ |
Root node |
// |
element with all paths ending with "//" after the specified child path |
E |
All e elements, the result is all three e elements |
c/e |
All the parent nodes are e elements of C, resulting in two e elements with ID values of E1 and E2 |
* |
Wildcard characters for paths |
/a/b/c/* |
A element →b all child elements under the element →c element, that is, the B element with a name value of B, the D element with the ID value of D1, and the two E elements with the ID value of E1 and E2 |
/*/*/d |
The D element with the level two node above, and the result is a D element with an ID value of D2 |
//* |
All the Elements |
| |
Logical OR |
B | C |
All B elements and C elements |
- Position matching
For each element, its child elements are ordered. Such as:
Example |
Meaning |
Match Results |
/A/B/C[1] |
A element →b element →c The first child element of the element |
b element with a name value of B |
/a/b/c[last ()] |
A element →b element →c The last child element of the element |
e element with ID value of E2 |
/a/b/c[position () >1] |
A element →b element →c element with a position number greater than 1 |
The D element with the ID value of D1 and two E elements with an ID value |
- Properties and properties
You can use attributes and property values in XPath to match an element, and note that the attribute name of an element must be preceded by a "@" prefix. For example:
Example |
Meaning |
Match Results |
b[@id] |
All b elements with attribute IDs |
ID value is two B elements of B1 and B2 |
B[@*] |
All B elements that have attributes |
Two b elements with ID attributes and one with the Name property B element |
B[not (@*)] |
All B elements that do not have attributes |
The B element under the →c element of a element |
b[@id = "B1"] |
b element with ID value of B1 |
The b element under a element |
- Family relationship Matching
XML documents can be grouped into tree structures, so any one node is not isolated. Usually we attribute the relationship between the nodes to a kinship, such as Father, child, ancestor, descendant, brother, etc. These concepts can also be used when matching elements. For example:
Example |
Meaning |
Match Results |
E/parent::* |
The parent node element of all E nodes |
ID value is A1 's a element and ID value is C1 c element |
F/ancestor::* |
Ancestor node elements of all F elements |
ID value is A1 's a element and ID value is C2 c element |
/a/child::* |
A's child element |
ID value is b1, b2 b element, id value is C2 c element, and e element without any attributes |
/a/descendant::* |
All descendant elements of a |
All other elements except the A element |
F/self::* |
All F's own elements |
F element itself |
F/ancestor-or-self::* |
All f elements and its ancestor node elements |
f element, F Element parent node C element and a element |
/a/c/descendant-or-self::* |
All A-element →c elements and their descendant elements |
ID value is c2 c element, child element B, D, f element of the element |
/a/c/following-sibling::* |
A element →c elements in the immediate sequence of all sibling node elements |
e-element without any attributes |
/a/c/preceding-sibling::* |
A element →c element is immediately preceding all sibling node elements |
ID value is two B elements of B1 and B2 |
/a/b/c/following::* |
A element →b element →c All elements of the subsequent sequence of elements |
b element with ID B2, c element without attribute, b element without attribute, D element with ID D2, f element without attribute, e element without attribute. |
/a/c/preceding::* |
A element →c all elements before the element |
b element with ID B2, e element with ID E2, e element with ID e1, D element with id D1, b element with name B, c element with ID C1, b element with ID B1 |
- Conditional matching
Conditional matching is the use of the Boolean value of the results of some functions to match the nodes with the conditions. The functions commonly used in conditional matching have four classes: node function, string function, numeric function, Boolean function. For example, the previous mentioned last (), position () and so on, here we will not repeat.
Of the above matching methods, the most used number of paths matching.