A simple instance of the Scrapy framework element selector XPath in Python

Source: Internet
Author: User
Tags xpath python web crawler

The original title: "Python web crawler-scrapy of the selector XPath" to the original text has been modified and interpreted

Advantage

XPath is more convenient to choose than CSS selectors.

    • No label for ID class Name property
    • Labels with no significant attributes or text characteristics
    • Tags with extremely complex nesting levels
XPath path

Positioning method

/ 绝对路径 表示从根节点开始选取// 相对路径 表示从任意节点开始
Basic node positioning
#查找html下的body下的form下的所有input节点/html/body/form/input#查找所有input节点//input
Using wildcard characters *Positioning
#查找form节点下的所有节点//form/*#查找所有节点//*#查找所有input节点(input至少有爷爷辈亲戚节点)//*/input
Using index positioning
#定位 第8个td下的 第2个a节点//*/td[7]/a[1]#定位 第8个td下的 第3个span节点//*/td[7]/span[2]#定位 最后一个td下的  最后一个a节点//*/td[last()]/a[last()]
Using Attributes
#定位所有包含name属性的input节点//input[@name]#定位含有属性的所有的input节点//input[@*]#定位所有value=2的input节点//input[@value=‘2‘]#使用多个属性定位//input[@value=‘2‘][@id=‘3‘]//input[@value=‘2‘ and @id=‘3‘]
Using function positioning
function meaning
Contains (,) The former contains the latter
Text () Gets the string in the node
Starts-with () String that matches the starting position

<a class="menu_hot" href="/ads/auth/promote.html">应用推广</a>

#定位href属性中包含“promote.html”的所有a节点//a[contains(@href,‘promote.html‘)]#元素内的文本为“应用推广”的所有a节点//a[text()=‘应用推广‘]#href属性值是以“/ads”开头的所有a节点//a[starts-with(@href,‘/ads‘)]
Using the XPath axis

This section is similar to the sibling, parents, children methods in BeautifulSoup.

Axis name meaning
Ancestor Selects all ancestor nodes of the current node
Ancestor-or-self Selects all ancestor nodes of the current node and the current node itself
Attribute Selects all properties of the current node
Child Selects all child nodes of the current node
Descendant Selects all descendant nodes of the current node
Descendant-or-self Selects all descendant nodes of the current node and the current node itself
Following Select all nodes at the end of the Party construction node
Parent Select the parent node of the current node
Preceding-sibling Selects all sibling nodes before the current node
Self Select the current node itself
Original address: Http://mp.weixin.qq.com/s/UT4UFDpgo2ER300zq_uqsQ

A simple instance of the Scrapy framework element selector XPath in Python

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.