Directory
- 1. Installing Xpath
- 2, XPath basic use
- 3. XPath syntax
- 3.1. Commonly used path expressions
- 3.2. Finding specific elements based on predicates
- 3.3. Wildcard characters
- 3.4. Select multiple paths
1. Installing Xpath
pip install lxml
2, XPath basic use
from lxml import etreehtml = ‘‘‘<p class="drink">饮料套餐 <a class="drink" id="coffee">咖啡</a> <a class="drink" id="milk">牛奶</a></p>‘‘‘selector = etree.HTML(html)content = selector.xpath(‘//a[@id="coffee"]/text()‘)print(content)
3. XPath syntax 3.1, commonly used path expressions
An expression |
Description |
NodeName |
Select all child nodes of this node. |
/ |
Select from the root node. |
// |
Selects the nodes in the document from the current node that matches the selection, regardless of their location. |
. |
Select the current node. |
.. |
Selects the parent node of the current node. |
@ |
Select the attribute. |
Path Expression |
Results |
Bookstore |
Selects all child nodes of the bookstore element. |
/bookstore |
Select the root element bookstore. Note: If the path starts with a forward slash (/), this path always represents the absolute path to an element! |
Bookstore/book |
Selects all book elements that belong to a child element of bookstore. |
Book |
Selects all book child elements, regardless of their position in the document. |
Bookstore//book |
Selects all book elements that belong to descendants of the bookstore element, regardless of where they are located under bookstore. |
@lang |
Select all attributes that are named Lang. |
3.2. Finding specific elements based on predicates
Path Expression |
Results |
/BOOKSTORE/BOOK[1] |
Selects the first book element that belongs to a bookstore child element. |
/bookstore/book[last ()] |
Select the last book element that belongs to the bookstore child element. |
/bookstore/book[last ()-1] |
Select the second-to-last book element that belongs to the bookstore child element. |
/bookstore/book[position () <3] |
Select the first two book element that belongs to the child element of the bookstore element. |
title[@lang] |
Select all the title elements that have properties named Lang. |
title[@lang = ' Eng '] |
Selects all title elements, and these elements have the lang attribute value of Eng. |
/BOOKSTORE/BOOK[PRICE>35.00] |
Selects all the book elements of the bookstore element, and the value of the price element must be greater than 35.00. |
/bookstore/book[price>35.00]/title |
Selects all the title elements of the book element in the bookstore element, and the value of the price element must be greater than 35.00. |
3.3. Wildcard characters
wildcard characters |
Description |
* |
Matches any element node. |
@* |
matches any attribute node. |
Node () |
Matches any type of node. |
Path Expression |
Results |
/bookstore/* |
Selects all child elements of the bookstore element. |
//* |
Selects all elements in the document. |
Title[@*] |
Select all the title elements with attributes. |
3.4. Select multiple paths
- By using the ' | ' in a path expression operator, you can select a number of paths.
Path Expression |
Results |
Book/title | Book/price |
Selects all the title and price elements of the book element. |
Title | Price |
Selects all the title and price elements in the document. |
/bookstore/book/title | Price |
Selects all the title elements of the book element that belong to the bookstore element, and all the price elements in the document. |
Resources:
W3school's XPath tutorial
The use of 13_python_ analytic library _xpath