First, install the XPath configuration
Install the XPath installation lxml library to download from a Python third-party library, or pip install lxml
Learn to use XPath
Import etree Module
From lxml import etree
Extracting web page code of interest using XPath
Selector= etree. HTML (HTML) (HTML page source code, via HTML = Requests.get (...). Text gets)
Converts the acquired source code into text that can be extracted with XPath
Content = Selector.xpath (a magical symbol)
This symbol can be obtained by right-clicking the →copy Xpath in the review element.
Such as
Xpath matches the idea:
Based on the structure of the HTML:
1. Tree-like structure
2, layer by step expansion
3. Level-by-layer positioning
4. Finding independent nodes
To extract content using XPath:
Locating the root node//
Looking down layer/
Extract text content/text ()
Extract attribute Contents/@xxxx
As an example:
Such as
Another special use of XPath--start with the same character
As an example,
Extract the labels that begin with test
Another special use – label set label
As an example,
A string (.) was used, with two back XPath
to learn the XPath URL: Http://search.jikexueyuan.com/course/?q=Python%E7%88%AC%E8%99%AB
Understanding of crawling Web data for XPath