1. XPath Basics
1.1 What is XPath.
XPath is a language for finding information (nodes) in an XML document . XPath can be used to traverse elements and attributes in an XML document. 1.2 node
The node is the smallest unit in which XPath extracts information from XML documents , with a total of 7 :
(1) elements node (element)
(2) property node (attribute)
(3) text node
(4) name naming node (namespace)
(5) Processing command nodes (processing-instruction)
(6) annotation node (comment)
(7) root node 1.3 node Relationship
(1) parent: Each element and attribute has a parent node.
(2) Sub-node (child): An element node can have 0, one, or more child nodes.
(3) sibling node (sibling): A node that has the same parent.
(4) predecessor node (ancestor): The parent node of a node's parent node.
(5) descendant (descendant): A child of a node's child node. 1.4 XPath Basic Usage 1.4.1 Basic Syntax:
(1)//(double slash): locates the root node , scans the Full-text, selects all eligible content in the document, and returns as a list .
(2)/(single slash): Look for the next-level path tag of the current label path or manipulate the current tab contents .
(3) /text (): Gets the text content under the current path.
(4)/@xxxx: Extract The property value of the label under the current path .
(5) | Optional: use | To select a number of paths such as //p |//div Select all eligible P and div tags under the current path.
(6). Point : Used to select the current node .
(7). (double point): Select the parent node of the current node .
(8)"*" (wildcard character): represents the matching of any element node .
(9)"@*" (wildcard character): Indicates that any property value is matched .
(Ten)"node ()" (wildcard): Represents a node that matches any type . 1.4.2 XPath extract element sample
#-*-Coding:utf-8-*-"" "Created on Tue June 10:23:19
2017
@author: Administrator" "" from
lxml i Mport etree
Text = ' '
The results of the execution are shown in the following illustration: