First, select the node
Common Road-strength Expressions:
An expression |
Describe |
Instance |
|
NodeName |
Select all child nodes of the NodeName node
|
XPath ('//div ')
|
All child nodes of a div node are selected |
/ |
Select from the root node |
XPath ('/div ') |
Select a div node from the root node |
// |
Select all the current nodes, regardless of their location
|
XPath ('//div ') |
Select All DIV nodes |
. |
Select the current node |
XPath ('./div ')
|
Select the div node under the current node |
.. |
Select the parent node of the current node
|
XPath ('.. ') |
Go back to the previous node |
@ |
Select Properties
|
XPath ('//@calss ')
|
Select all the class attributes |
Second, predicate
Predicates are nested inside square brackets to find a particular node or a node that contains a defined value
Instance:
An expression |
Results |
XPath ('/body/div[1] ')
|
Select the first div node under body |
XPath ('/body/div[last ()] ')
|
Select the last div node under body |
XPath ('/body/div[last ()-1] ') |
Select the second penultimate div node under body |
XPath ('/body/div[positon () <3] ') |
Select the top two div nodes under body |
XPath ('/body/div[@class] ') |
Select the div node with the class attribute under body |
XPath ('/body/div[@class = "main"] ') |
Select the DIV node under the Body Class property as main |
XPath ('/body/div[price>35.00] ') |
Select a div node with the price element value greater than 35 under body |
Three, wildcard characters
XPath selects unknown XML element by wildcard character
An expression |
Results |
XPath ('/div/* ')
|
Select all sub-nodes under Div |
XPath ('/div[@*] ')
|
Select all DIV nodes with attributes |
four, take multiple paths
Use "|" operator can select multiple paths
An expression |
Results |
XPath ('//div|//table ')
|
Select all div and table nodes |
Five, the XPath axis
Axis can define a node set relative to the current node
Axis Name |
An expression |
Describe |
Ancestor
|
XPath ('./ancestor::* ') |
Selects all ancestor nodes of the current node (parent, grandfather) |
Ancestor-or-self |
XPath ('./ancestor-or-self::* ') |
Selects all ancestor nodes of the current node and the node itself |
Attribute |
XPath ('./attribute::* ') |
Selects all properties of the current node |
Child |
XPath ('./child::* ') |
Returns all child nodes of the current node |
Descendant |
XPath ('./descendant::* ') |
Returns all descendant nodes (child nodes, grandchild nodes) of the current node |
Following |
XPath ('./following::* ') |
Selects all nodes after the end tag of the current node in the document
|
Following-sibing |
XPath ('./following-sibing::* ') |
Select the sibling node after the current node |
Parent |
XPath ('./parent::* ') |
Select the parent node of the current node |
Preceding |
XPath ('./preceding::* ') |
Selects all nodes in the document before the start tag of the current node |
Preceding-sibling |
XPath ('./preceding-sibling::* ') |
Select the sibling node before the current node |
Self |
XPath ('./self::* ') |
Select the current node |
vi. function Functions
Use function function to better fuzzy search
Function |
Usage |
Explain |
Starts-with |
XPath ('//div[starts-with (@id, ' ma ')] ') |
Select the div node whose ID value starts with MA
|
Contains
|
XPath ('//div[contains (@id, ' ma ')] ') |
Select the div node with the ID value that contains the MA |
and
|
XPath ('//div[contains (@id, "Ma") and contains (@id, "in")] |
Select the ID value that contains the DIV node for Ma and in |
Text () |
XPath ('//div[contains (text (), ' ma ')] ') |
Select the node text that contains the div node of the MA |
|
|
|
Scrapy XPath Document: http://doc.scrapy.org/en/0.14/topics/selectors.html
Python crawler: XPath syntax notes