XPath for the Python parsing library

Source: Internet
Author: User
Tags xpath

1. XPath (XML Path Language) XML Pathname language

2. XPath Common rules:

NodeName Select all child nodes of this node

/Select a direct child node from the current node.

Select descendant nodes from the current node

. Select the current node.

.. Select the parent node of the current node

@ Select Attributes

3. Example

1  fromlxmlImportetree2 3Text =" "4 <div>5 <ul>6 <li class= "item-0" ><a href= "link1.html" >first item</a></li>7 <li class= "item-1" ><a href= "link2.html" >second item</a></li>8 <li class= "item-inactive" ><a href= "link3.html" >third item</a></li>9 <li class= "item-1" ><a href= "link4.html" >fourth item</a></li>Ten <li class= "item-0" ><a href= "link5.html" >fifth item</a> One </ul> A </div> - " " -html = etree. HTML (text)#Initialize, construct XPath object the #automatically fix HTML code, last <li> not closed, ToString () method complements HTML code, return result is bytes type -result =etree.tostring (HTML) - Print(Result.decode ('Utf-8'))

You can also read the file to parse it

1  from Import etree 2 3 html = etree.parse (r'C:\Users\Administrator\Desktop\test.txt', etree. Htmlparser ())4 result =5Print(Result.decode ('  Utf-8'))

4. Use the XPath rule that starts with//to select the node that meets the requirements

 fromlxmlImportEtreetext=" "<div> <ul> <li class= "item-0" ><a href= "link1.html" >first item</a></li> <li class= "item-1" ><a href= "link2.html" >second item</a></li> <li class= "item-inact Ive "><a href=" link3.html "> Love me China </a></li> <li class=" item-1 "><a href=" link4.html ">f Ourth item</a></li> <li class= "item-0" ><a href= "link5.html" >fifth item</a> </ul ></div>" "" "Matching Nodes" "HTML=etree. HTML (text) result1= Html.xpath ('//*')#use * to match all nodesPrint(RESULT1) result2= Html.xpath ('//li')#get all the LI nodesPrint(RESULT2)Print(result2[0]) RESULT3= Html.xpath ('//li/a')#get the direct a child node of all LI nodesPrint(RESULT3)#First, select the A node with the href attribute as link3.html, and then get its parent node, getting the value of its Class property#result4 for [' Item-inactive '], which is a list of only one elementRESULT4 = Html.xpath ('//a[@href = "link3.html"]/. /@class')Print(result4[0])#at the same time, the parent:: To obtain the Father node, such as:RESULT5 = Html.xpath ('//a[@href = "link3.html"]/parent::*/@class')" "Property Matching (when selecting a node, you can filter the attribute with the @ symbol)" "#The Li node that matches the attribute class= "Item-inactive"RESULT6 = Html.xpath ('//li[@class = "Item-inactive"]')Print(RESULT6)" "text fetching (using the text () method in XPath to get the literal in the node)" "result7= Html.xpath ('//li[@class = "Item-inactive"]/a[@href = "link3.html"]/text ()')Print(RESULT7)#Print out the list of [' Love Me China ']" "property gets the property by using @" "#The class attribute of the parent node of the a node that matches the attribute href= "link3.html"RESULT8 = Html.xpath ('//a[@href = "link3.html"]/. /@class')Print(RESULT8)#print [' item-inactive ']" "attribute multi-value matching" "html_test=" "<li class= "Li item-inactive" ><a href= "link3.html" > Love me China </a></li>" "#here, the Li Tag class attribute has two values, and if the match is not matched according to the above property, use the Contains () functionHtml_test =etree. HTML (html_test)#with the Contains method, the first parameter wears the property name, and any of the second pass-through property values can be matched toResult9 = Html_test.xpath ('//li[contains (@class, "Li")]/a/text ()')Print(RESULT9)" "Multi-attribute matching (determines a node based on multiple attributes)" "Html_test2=" "<li class= "li item-inactive" name= "item" ><a href= "link3.html" >hello world</a></li>" "#here, the Li Tag class attribute has two values, and if the match is not matched according to the above property, use the Contains () functionHtml_test =etree. HTML (HTML_TEST2)#with the Contains method, the first parameter wears the property name, and any of the second pass-through property values can be matched toResult10 = Html_test.xpath ('//li[contains (@class, li) and @name = "item"]/a[@href = "link3.html"]/text ()')Print(result10)#print [' Hello World ']

5. XPath Operators

5. Sequential selection (when multiple nodes are matched but only one of them is desired)

 fromlxmlImportEtreetext=" "<div> <ul> <li class= "item-0" ><a href= "link1.html" >first item</a></li> <li class= "item-1" ><a href= "link2.html" >second item</a></li> <li class= "item-inact Ive "><a href=" link3.html "> Love me China </a></li> <li class=" item-1 "><a href=" link4.html ">f Ourth item</a></li> <li class= "item-0" ><a href= "link5.html" >fifth item</a> </ul ></div>" "" "Select by order after matching nodes" "HTML=etree. HTML (text) result1= Html.xpath ('//li[1]/a/text ()')#Select the first of the Li nodes that match toPrint(RESULT1) result2= Html.xpath ('//li[last ()]/a/text ( )')#Select the last of the Li nodes that match toPrint(RESULT2) RESULT3= Html.xpath ('//li[position () <3]/a/text ( )')#Select the position of all the Li nodes that match to be less than 3, also the 1th, 2Print(RESULT3) Result4= Html.xpath ('//li[last () -2]/a/text ( )')#Select the third-to -last of the matching LI nodesPrint(RESULT4)" "Node Axis selection" "HTML=etree. HTML (text) result5= Html.xpath ('//li[1]/ancestor::*')#selects all ancestor nodes that match the first of the LI nodesPrint(RESULT5) result6= Html.xpath ('//li[1]/attribute::*')#Select all attribute values for the Li node that matches toPrint(RESULT6) result7= Html.xpath ('//li[1]/child::a')#Select all child nodes of the Li node that match toPrint(result7) result8= Html.xpath ('//li[1]/descendant::a')#selects all descendant nodes of the matching Li nodePrint(RESULT8) Result9= Html.xpath ('//li[1]/following::*')#selects all nodes after getting to the current nodePrint(RESULT9) result10= Html.xpath ('//li[1]/following-sibling::*')#selects all sibling nodes after the current node that gets toPrint(result10)

XPath for the Python parsing library

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.