First, select the node
Common Road-strength Expressions:
An expression |
Describe |
Instance |
|
NodeName |
Select all child nodes of the NodeName node |
XPath ('//div ') |
All child nodes of a div node are selected |
/ |
Select from the root node |
XPath ('/div ') |
Select a div node from the root node |
// |
Select all the current nodes, regardless of their location |
XPath ('//div ') |
Select All DIV nodes |
. |
Select the current node |
XPath ('./div ') |
Select the div node under the current node |
.. |
Select the parent node of the current node |
XPath ('.. ') |
Go back to the previous node |
@ |
Select Properties |
XPath ('//@calss ') |
Select all the class attributes |
Second, predicate
Predicates are nested inside square brackets to find a particular node or a node that contains a defined value
Instance:
An expression |
Results |
XPath ('/body/div[1] ') |
Select the first div node under body |
XPath ('/body/div[last ()] ') |
Select the last div node under body |
XPath ('/body/div[last ()-1] ') |
Select the second penultimate div node under body |
XPath ('/body/div[positon () <3] ') |
Select the top two div nodes under body |
XPath ('/body/div[@class] ') |
Select the div node with the class attribute under body |
XPath ('/body/div[@class = "main"] ') |
Select the DIV node under the Body Class property as main |
XPath ('/body/div[price>35.00] ') |
Select a div node with the price element value greater than 35 under body |
Three, wildcard characters
XPath selects unknown XML element by wildcard character
An expression |
Results |
XPath ('/div/* ') |
Select all sub-nodes under Div |
XPath ('/div[@*] ') |
Select all DIV nodes with attributes |
four, take multiple paths
Use "|" operator can select multiple paths
An expression |
Results |
XPath ('//div|//table ') |
Select all div and table nodes |
Five, the XPath axis
Axis can define a node set relative to the current node
Axis Name |
An expression |
Describe |
Ancestor |
XPath ('./ancestor::* ') |
Selects all ancestor nodes of the current node (parent, grandfather) |
Ancestor-or-self |
XPath ('./ancestor-or-self::* ') |
Selects all ancestor nodes of the current node and the node itself |
Attribute |
XPath ('./attribute::* ') |
Selects all properties of the current node |
Child |
XPath ('./child::* ') |
Returns all child nodes of the current node |
Descendant |
XPath ('./descendant::* ') |
Returns all descendant nodes (child nodes, grandchild nodes) of the current node |
Following |
XPath ('./following::* ') |
Selects all nodes after the end tag of the current node in the document |
Following-sibing |
XPath ('./following-sibing::* ') |
Select the sibling node after the current node |
Parent |
XPath ('./parent::* ') |
Select the parent node of the current node |
Preceding |
XPath ('./preceding::* ') |
Selects all nodes in the document before the start tag of the current node |
Preceding-sibling |
XPath ('./preceding-sibling::* ') |
Select the sibling node before the current node |
Self |
XPath ('./self::* ') |
Select the current node |
vi. function Functions
Use function function to better fuzzy search
Function |
Usage |
Explain |
Starts-with |
XPath ('//div[starts-with (@id, ' ma ')] ') |
Select the div node whose ID value starts with MA |
Contains |
XPath ('//div[contains (@id, ' ma ')] ') |
Select the div node with the ID value that contains the MA |
and |
XPath ('//div[contains (@id, "Ma") and contains (@id, "in")] |
Select the ID value that contains the DIV node for Ma and in |
Text () |
XPath ('//div[contains (text (), ' ma ')] ') |
Select the node text that contains the div node of the MA |
|
|
|
Scrapy XPath Document: http://doc.scrapy.org/en/0.14/topics/selectors.html
Python crawler: XPath syntax notes