First, select the node
Common Road-strength Expressions:
| An expression |
Describe |
Instance |
|
| NodeName |
Select all child nodes of the NodeName node |
XPath ('//div ') |
All child nodes of a div node are selected |
| / |
Select from the root node |
XPath ('/div ') |
Select a div node from the root node |
| // |
Select all the current nodes, regardless of their location |
XPath ('//div ') |
Select All DIV nodes |
| . |
Select the current node |
XPath ('./div ') |
Select the div node under the current node |
| .. |
Select the parent node of the current node |
XPath ('.. ') |
Go back to the previous node |
| @ |
Select Properties |
XPath ('//@calss ') |
Select all the class attributes |
Second, predicate
Predicates are nested inside square brackets to find a particular node or a node that contains a defined value
Instance:
| An expression |
Results |
| XPath ('/body/div[1] ') |
Select the first div node under body |
| XPath ('/body/div[last ()] ') |
Select the last div node under body |
| XPath ('/body/div[last ()-1] ') |
Select the second penultimate div node under body |
| XPath ('/body/div[positon () <3] ') |
Select the top two div nodes under body |
| XPath ('/body/div[@class] ') |
Select the div node with the class attribute under body |
| XPath ('/body/div[@class = "main"] ') |
Select the DIV node under the Body Class property as main |
| XPath ('/body/div[price>35.00] ') |
Select a div node with the price element value greater than 35 under body |
Three, wildcard characters
XPath selects unknown XML element by wildcard character
| An expression |
Results |
| XPath ('/div/* ') |
Select all sub-nodes under Div |
| XPath ('/div[@*] ') |
Select all DIV nodes with attributes |
four, take multiple paths
Use "|" operator can select multiple paths
| An expression |
Results |
| XPath ('//div|//table ') |
Select all div and table nodes |
Five, the XPath axis
Axis can define a node set relative to the current node
| Axis Name |
An expression |
Describe |
| Ancestor |
XPath ('./ancestor::* ') |
Selects all ancestor nodes of the current node (parent, grandfather) |
| Ancestor-or-self |
XPath ('./ancestor-or-self::* ') |
Selects all ancestor nodes of the current node and the node itself |
| Attribute |
XPath ('./attribute::* ') |
Selects all properties of the current node |
| Child |
XPath ('./child::* ') |
Returns all child nodes of the current node |
| Descendant |
XPath ('./descendant::* ') |
Returns all descendant nodes (child nodes, grandchild nodes) of the current node |
| Following |
XPath ('./following::* ') |
Selects all nodes after the end tag of the current node in the document |
| Following-sibing |
XPath ('./following-sibing::* ') |
Select the sibling node after the current node |
| Parent |
XPath ('./parent::* ') |
Select the parent node of the current node |
| Preceding |
XPath ('./preceding::* ') |
Selects all nodes in the document before the start tag of the current node |
| Preceding-sibling |
XPath ('./preceding-sibling::* ') |
Select the sibling node before the current node |
| Self |
XPath ('./self::* ') |
Select the current node |
vi. function Functions
Use function function to better fuzzy search
| Function |
Usage |
Explain |
| Starts-with |
XPath ('//div[starts-with (@id, ' ma ')] ') |
Select the div node whose ID value starts with MA |
| Contains |
XPath ('//div[contains (@id, ' ma ')] ') |
Select the div node with the ID value that contains the MA |
| and |
XPath ('//div[contains (@id, "Ma") and contains (@id, "in")] |
Select the ID value that contains the DIV node for Ma and in |
| Text () |
XPath ('//div[contains (text (), ' ma ')] ') |
Select the node text that contains the div node of the MA |
|
|
|
Scrapy XPath Document: http://doc.scrapy.org/en/0.14/topics/selectors.html
Python crawler: XPath syntax notes