1. Basic XPath syntax
Installation and use:
1. download: Pip install lxml2. import package: From lxml import etree3. convert the HTML or XML document into an etree object, and then call the method in the object to find the specified node 2.1 local file: tree = etree. parse (File Name) tree. XPath ("XPath expression") 2.2 network data: Tree = etree. HTML (webpage content string) tree. install the XPath plug-in to verify the XPath expression in the browser: You can directly execute the XPath expression in the plug-in to drag the XPath plug-in to the Google browser expansion program (more tools) the agent is successfully started and disabled. CTRL + Shift + x
Test page data
<HTML lang = "en">
XPATH expression:
/Represents a // positioning all attributes: # Find the DIV tag whose class attribute value is Song // Div [@ class = "song"] level & Index positioning: # Find the immediate sub-tag of the DIV whose class property value is Tang. A // Div [@ class = "Tang"]/ul/Li [2]/a logical operation: # locate the tag whose href attribute is empty and whose class attribute is Du. // A [@ href = "" And @ class = "du"] fuzzy match: // Div [contains (@ class, "NG")] // Div [starts-with (@ class, "ta")] retrieves the text: #/indicates obtaining text content under a tag # // indicates obtaining text content under a tag and text content under all subtags // Div [@ class = "song"] /P [1]/text () // Div [@ class = "Tang"] // take attributes of text: // Div [@ class = "Tang"] // Li [2]/A/@ href
2. Obtain the bits in direct employment of the boss
Data parsing through xpath