Python crawler: XPath syntax notes

Source: Internet
Author: User
Tags xpath

First, select the node
Common Road-strength Expressions:

An expression Describe Instance
NodeName Select all child nodes of the NodeName node XPath ('//div ') All child nodes of a div node are selected
/ Select from the root node XPath ('/div ') Select a div node from the root node
// Select all the current nodes, regardless of their location XPath ('//div ') Select All DIV nodes
. Select the current node XPath ('./div ') Select the div node under the current node
.. Select the parent node of the current node XPath ('.. ') Go back to the previous node
@ Select Properties XPath ('//@calss ') Select all the class attributes

Second, predicate

Predicates are nested inside square brackets to find a particular node or a node that contains a defined value

Instance:

An expression Results
XPath ('/body/div[1] ') Select the first div node under body
XPath ('/body/div[last ()] ') Select the last div node under body
XPath ('/body/div[last ()-1] ') Select the second penultimate div node under body
XPath ('/body/div[positon () <3] ') Select the top two div nodes under body
XPath ('/body/div[@class] ') Select the div node with the class attribute under body
XPath ('/body/div[@class = "main"] ') Select the DIV node under the Body Class property as main
XPath ('/body/div[price>35.00] ') Select a div node with the price element value greater than 35 under body

Three, wildcard characters

XPath selects unknown XML element by wildcard character

An expression Results
XPath ('/div/* ') Select all sub-nodes under Div
XPath ('/div[@*] ') Select all DIV nodes with attributes

four, take multiple paths

Use "|" operator can select multiple paths

An expression Results
XPath ('//div|//table ') Select all div and table nodes

Five, the XPath axis

Axis can define a node set relative to the current node

Axis Name An expression Describe
Ancestor XPath ('./ancestor::* ') Selects all ancestor nodes of the current node (parent, grandfather)
Ancestor-or-self XPath ('./ancestor-or-self::* ') Selects all ancestor nodes of the current node and the node itself
Attribute XPath ('./attribute::* ') Selects all properties of the current node
Child XPath ('./child::* ') Returns all child nodes of the current node
Descendant XPath ('./descendant::* ') Returns all descendant nodes (child nodes, grandchild nodes) of the current node
Following XPath ('./following::* ') Selects all nodes after the end tag of the current node in the document
Following-sibing XPath ('./following-sibing::* ') Select the sibling node after the current node
Parent XPath ('./parent::* ') Select the parent node of the current node
Preceding XPath ('./preceding::* ') Selects all nodes in the document before the start tag of the current node

Preceding-sibling XPath ('./preceding-sibling::* ') Select the sibling node before the current node
Self XPath ('./self::* ') Select the current node

vi. function Functions

Use function function to better fuzzy search

Function Usage Explain
Starts-with XPath ('//div[starts-with (@id, ' ma ')] ') Select the div node whose ID value starts with MA
Contains XPath ('//div[contains (@id, ' ma ')] ') Select the div node with the ID value that contains the MA
and XPath ('//div[contains (@id, "Ma") and contains (@id, "in")] Select the ID value that contains the DIV node for Ma and in
Text () XPath ('//div[contains (text (), ' ma ')] ') Select the node text that contains the div node of the MA

Scrapy XPath Document: http://doc.scrapy.org/en/0.14/topics/selectors.html

Python crawler: XPath syntax notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.