XPath's advanced application in Python

Source: Internet
Author: User
Tags xpath

XPath plays a pivotal role in Python's crawler learning, comparing regular expression re to doing the same work and achieving similar functions, but XPath is significantly more advantageous than re and makes re a second-tier in Web analytics.

XPath Introduction:
What is it? All called XML Path Language A small query language
said that XPath is a language, and has to say that it has the advantages:
1) can find information in XML
2) support for HTML lookup
3) navigating through elements and attributes

python Development uses XPath conditions:
Because XPath belongs to the lxml library module, you first install the library lxml, and you can view the blog, including the installation methods for Easy_install and Pip, in the specific installation process.

Simple invocation method of XPath:

 from Import etreeselector=etree. HTML (source code) # Converts the   source code into a format that can be matched by XPath selector.xpath (expression)  # returns to a list

How to use XPath:
First, let's talk about the basic syntax of XPath:
How to use four kinds of labels
1) // double slash locates the root node, scans the full text, selects all eligible content in the document, and returns it as a list.
2) / single Slash find the next layer of path label for the current label path or manipulate the current path label content
3) /text () Gets the text content under the current path
4) /@xxxx Extract the property value of the tag under the current path
5) | Optional Use | You can select several paths such as//p | The DIV selects all the eligible P tags and div tags under the current path.
6) . Click to select the current node
7).. Select the parent node of the current node with two points
There are also Starts-with (@ attribute name, same part of attribute character), string (.) Two important special methods are highlighted later.

Use an example to explain how XPath is used:

From lxml import etreehtml= "" "<!DOCTYPE HTML>    <HTML>        <HeadLang= "en">        <title>Test</title>        <Metahttp-equiv= "Content-type"content= "text/html; charset=utf-8" />        </Head>        <Body>            <DivID= "Content">                <ulID= "ul">                    <Li>The</Li>                    <Li>No.2</Li>                    <Li>No.3</Li>                </ul>                <ulID= "Ul2">                    <Li>One</Li>                    <Li>Both</Li>                </ul>            </Div>            <DivID= "url">                <ahref= "Http:www.58.com"title= "+">58</a>                <ahref= "Http:www.icnlogs.com"title= "Cnblog">Cnblog</a>            </Div>        </Body>    </HTML>
Selector=etree. HTML (HTML) content=selector.xpath ('//div[@id = "Content"]/ul[@id = "ul"]/li/text ()' # Here the id attribute is used to locate which DIV and UL are matched using text () to get the textual content  for inch content:     Print I

#输出为

The

No.2

No.3

Con=selector.xpath ('//a/@href'# is used here//to locate qualifying a tags from the full text, using "@ Tag Properties" Gets the href attribute value for a note for the in   con:    theprint each

#输出结果为:

Http:www.58.com

Http:www.csdn.net

Con=selector.xpath ('/html/body/div/a/@title'# position a tag's title con with absolute path =selector.xpath ('//a/@title'# using relative path positioning both effects are the same as print Len (Con) print con[0]con[1]

#输出结果为:

2

58

Cnblog

XPath's advanced application in Python

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.