Use XPath to read Xml files
The purpose of introducing XPath is to find a node element accurately when matching the XML document structure tree. You can compare XPath to a file management path: through the file management path, you can find the desired file according to certain rules. Similarly, according to the rules set by XPath, you can also easily find any node in the XML structure document tree. However, since XPath can be applied to more than one standard, W3C releases it as a supporting standard for XSLT, which is an important part of XSLT and the XPointer we will discuss later. Before introducing the matching rules of XPath, let's take a look at some basic concepts about XPath. The first thing to talk about is the XPath data type. XPath can be divided into four data types:
- Node-set)
A node set is a set of nodes that meet the conditions returned by path matching. Other types of data cannot be converted to a node set.
- Boolean)
The condition matching value returned by a function or Boolean expression is the same as the Boolean value in a general language and has two values: true and false. Boolean values can be converted to numeric and string types.
- String)
A string is a collection of characters. XPath provides a series of string functions. A string can be converted to data of the numeric or boolean type.
- Number)
In XPath, the value is a floating point number, which can be a 64-bit double-precision floating point number. In addition, it includes some special descriptions of numerical values, such as non-numeric NaN (Not-a-Number), positive infinity, negative infinity, and positive and negative zero. The integer value of number can be obtained through the function. In addition, the value can also be converted to boolean and string types.
The last three data types are similar to the corresponding data types in other programming languages, but the first data type is a unique product of the XML document tree. In addition, because XPath contains a series of operations on the document structure tree, it is also necessary to understand the XPath node type. Recall the logical structure of the XML document mentioned in chapter 2. an XML file can contain elements, CDATA, annotations, processing instructions, and other logical elements. The elements can also contain attributes, you can also use attributes to define namespaces. Correspondingly, in XPath, nodes are divided into seven node types:
- Root Node)
The root node is the top layer of a tree, and the root node is unique. All other element nodes on the tree are their child nodes or descendant nodes. The root node is processed in the same way as other nodes. In XSLT, tree matching always starts from the root node.
- Element Node)
An element node corresponds to every element in the document. A child node of an element node can be an element node, a comment node, a processing command node, and a text node. You can define a unique id for an element node. Each element node can have an extension. It consists of two parts: one is the namespace URI and the other is the local name.
- Text Node)
A text node contains a set of character data, that is, the characters contained in CDATA. No text node is adjacent to any sibling text node, and the text node has no extension.
- Attribute Nodes)
Each element node has an associated set of attribute nodes. The element is the parent node of each attribute node, but the attribute node is not a child node of its parent element. This means that the child node of the element can match the attribute node of the element, but in turn it is not true, only one-way. Furthermore, attribute nodes of elements are not shared, that is, different element nodes do not have the same attribute node. Processing of default properties is equivalent to defining properties. If an attribute is declared in DTD but declared as # IMPLIED, and this attribute is not defined in the element, the attribute node of the element does not contain this attribute. In addition, the attribute nodes corresponding to the attribute do not have namespace declarations. The namespace attribute corresponds to another type of node.
- Namespace Node)
Each element node has a related namespace node set. In XML documents, namespaces are declared by retaining attributes. Therefore, in XPath, such nodes are very similar to attribute nodes, and their relationships with parent elements are unidirectional, it is not shared.
- Processing Instruction Nodes)
The processing command node corresponds to each processing command in the XML document. It also has an extension. The local name of the extension points to the processing object, and the namespace part is empty.
- Comment Nodes)
The comment node corresponds to the comment in the document.
Next, we will construct an XML document tree, which will be used as an example below:
<A id = "a1"> <B id = "b1"> <C id = "c1"> <B name = "B"/> <D id = "d1"/> <E id = "e1"/> <E id = "e2"/> </C> </B> <B id = "b2"/> <C id = "c2"> <B/> <D id = "d2"/> <F/> </C> <E/> </A> |
Now we will introduce some basic methods for node matching in XPath.
- Path Matching
Path Matching is similar to the expression of the file path, which is easy to understand. There are several symbols:
Letter Number |
Meaning |
For example |
Matching result |
/ |
Indicates the Node path |
/A/C/D |
Node "A" subnode "C" subnode "D", that is, D node whose id value is d2 |
/ |
Root Node |
// |
All elements whose paths end with the specified sub-path after "//" |
// E |
All Eelements. The result is all three Eelements. |
// C/E |
All the eElements whose parent node is C are Eelements whose id values are e1 and e2. |
* |
Path wildcard |
/A/B/C /* |
Element A → Element B → all child elements under Element C, that is, Element B whose name is B, Element D whose id is d1, and two Eelements whose id value is e1 and e2 |
/*/D |
There are two levels of node D elements. The matching result is the D element with the id value of d2. |
//* |
All elements |
| |
Logic or |
// B | // C |
All B and C Elements |
- Location match
For each element, its child elements are ordered. For example:
For example |
Meaning |
Matching result |
/A/B/C [1] |
Element A → Element B → the first child element of Element C |
B element whose name is B |
/A/B/C [last ()] |
Element A → Element B → last child element of Element C |
Eelement whose id is e2 |
/A/B/C [position ()> 1] |
Element A → Element B → element whose position number is greater than 1 under Element C |
D element with the id value of d1 and two Eelements with the id value |
- Attributes and attributes
In XPath, attributes and attribute values can be used to match elements. Note that the attribute names of elements must have a "@" prefix before them. For example:
For example |
Meaning |
Matching result |
// B [@ id] |
All B elements with property IDS |
Two B elements whose id values are b1 and b2 |
// B [@ *] |
All B elements with attributes |
Two B elements with the id attribute and one B element with the name attribute |
// B [not (@ *)] |
All B elements that do not have attributes |
Element A → Element B under Element C |
// B [@ id = "b1"] |
B element whose id is b1 |
B element under Element |
- Kinship match
XML documents can be categorized into tree structures, so no node is isolated. Generally, we define the attribution relationship between nodes as a kinship, such as parent, child, ancestor, descendant, and brother. These concepts can also be used for element matching. For example:
For example |
Meaning |
Matching result |
// E/parent ::* |
Parent node element of all E nodes |
Element A whose id is a1 and element C whose id is c1 |
// F/ancestor ::* |
Ancestor node elements of all F elements |
Element A whose id is a1 and element C whose id is c2 |
/A/child ::* |
Child element of |
B element whose id value is b1 and b2, C element whose id value is c2, and Eelement without any attribute |
/A/descendant ::* |
All descendant elements of |
All other elements except element |
// F/self ::* |
All elements of F |
F element itself |
// F/ancestor-or-self ::* |
All F elements and their ancestor node Elements |
F element, F element's parent node C element, and A Element |
/A/C/descendant-or-self ::* |
All elements A → c and their descendant Elements |
The C element whose id is c2, the child elements B, D, and F of this element |
/A/C/following-sibling ::* |
Element A → all sibling node elements in the descending order of Element C |
Eelement without any attribute |
/A/C/preceding-sibling ::* |
Element A → all sibling node elements next to element C |
Two B elements whose id values are b1 and b2 |
/A/B/C/following ::* |
Element A → Element B → all elements in the descending order of Element C |
Element B with id b2, element C without attributes, Element B without attributes, Element D with id d2, element F without attributes, and element ewithout attributes. |
/A/C/preceding ::* |
Element A → all elements before element C |
Element B whose id is b2, element ewhose id is e2, element ewhose id is e1, Element D whose id is d1, Element B whose name is B, and element C whose id is c1, B element with id b1 |
- Condition match
Conditional matching is to use the Boolean values of some functions to match nodes that meet the conditions. The following types of functions are commonly used for condition matching: node functions, string functions, numeric functions, and Boolean functions. For example, last (), position (), and so on. These functions help us find the desired node precisely. Functions and functions Function Count () Returns the number of nodes that meet the conditions. Number () function Convert the text in the attribute value to a value Substring () Syntax: substring (value, start, length) Truncate string Sum () function Sum These functions are only part of the XPath syntax. A large number of functional functions are not introduced yet, and the XPath syntax is still evolving. Through these functions, we can implement more complex queries and operations. Among the above matching methods, the maximum number of Path Matching is also used. Locate the node based on the given sub-path relative to the current path. Use SelectSingleNode () and SelectNodes () to search for node namespace ConsoleApplication1 { Class Program { Static void Main (string [] args) { XmlDocument doc = new XmlDocument (); // create a Document Object Try { Doc. Load ("http://www.cnblogs.com/myOrder.xml "); XmlNode root = doc. DocumentElement; // get the root node of the document XmlNode temp; Temp = root. SelectSingleNode ("name "); Console. WriteLine ("(search 1)" + temp ); Temp = root. SelectSingleNode ("ordering person information/Name "); Console. WriteLine ("(search 2)" + temp. Name + ":" + temp. InnerText ); Temp = root. SelectSingleNode ("order information/product name "); Console. WriteLine ("(search 3)" + temp. Name + ":" + temp. InnerText ); XmlNodeList templist = root. SelectNodes ("order information/product name "); Console. WriteLine ("(search 4 )"); Foreach (XmlNode nodeinlist in templist) { Console. WriteLine (nodeinlist. Name + ":" + nodeinlist. InnerText ); } } Catch (Exception ex) { Console. WriteLine (ex. Message ); } Console. ReadLine (); // auxiliary code, used to keep the Console window } } }Search nodes in xml (two methods) Namespace ConsoleApplication1 { Class Program { Static void Main (string [] args) { XmlDocument doc = new XmlDocument (); // create a Document Object Try { Doc. Load ("http://www.cnblogs.com/myOrder.xml "); // Search for elements in the xmlDocument object Console. WriteLine (""); XmlNodeList myNodeList = doc. GetElementsByTagName ("Product Name "); For (int I = 0; I <myNodeList; I ++) { Console. WriteLine (myNodeList [I]. Name + ":" + myNodeList [I]. InnerText ); } // Search for elements in the xmlElement object Console. WriteLine ("Searching for elements in an xmlElement object "); XmlElement myElement = doc. DocumentElement; MyElement = (XmlElement) myElement. LastChild; MyNodeList = myElement. GetElementsByTagName ("name "); For (int I = 0; I <myNodeList; I ++) { Console. WriteLine (myNodeList [I]. Name + ":" + myNodeList [I]. InnerText ); } } Catch (Exception ex) { Console. WriteLine (ex. Message ); } Console. ReadLine (); // auxiliary code, used to keep the Console window } } } |