Python crawler lxml-etree and XPath use (with case)

Source: Internet
Author: User
Tags xml parser

This article introduces you to the Python crawler Lxml-etree and XPath use (attached case), the content is very detailed, I hope to help everyone.

Lxml:python's Html/xml Parser

Official documents: https://lxml.de/

Before use, need to install an lxml bag

Function:

1. Parsing HTML: Using Etree. HTML (text) parses HTML fragments of string format into HTML documents

2. Read the XML file

3.etree and XPath work together

Installation of lxml

"Pycharm" > "File" > "Settings" > "Project Interpreter" > "+" > "lxml" > "Install"

Specific operation:

Use of Lxml-etree

    • Case V25 File: https://xpwi.github.io/py/py%E7%88%AC%E8%99%AB/py25etree.py

    • Parsing HTML code with lxml

# First install lxml# with lxml to parse HTML code from lxml Import etreetext = ' <p>    <ul>        <li class= ' item-0 ' ><a hr ef= "0.html" >item 0 </a></li>        <li class= "item-1" ><a href= "1.html" >item 1 </a> </li>        <li class= "item-2" ><a href= "2.html" >item 2 </a></li>        <li class= " Item-3 "><a href=" 3.html ">item 3 </a></li>        <li class=" item-4 "><a href=" 4.html "> Item 4 </a></li>        <li class= "item-5" ><a href= "5.html" >item 5 </a></li>    </ul>     </p> ' # using Etree. HTML parses the string into HTML file HTML = etree. HTML (text) s = etree.tostring (HTML). Decode () print (s)

Run results

Use of Lxml-etree

    • Case V26etree2 File: https://xpwi.github.io/py/py%E7%88%AC%E8%99%AB/py26etree2.py

    • To read an XML file:

# Lxml-etree read file from lxml import etreexml = Etree.parse ("./py24.xml") sXML = etree.tostring (XML, pretty_print=true) print (sXML)

Run results

Etree and XPath used together

    • Case V26expath. File: https://xpwi.github.io/py/py%E7%88%AC%E8%99%AB/py26expath.py

    • Etree and XPath work together:

# Lxml-etree read file from lxml import etreexml = Etree.parse ("./py24.xml") Print (Type (XML)) # Find all book node rst = Xml.xpath ('//boo K ') print (RST) # finds elements with the category attribute value of sport Rst2 = Xml.xpath ('//book[@category = "Sport") print (Type ( RST2)) print (RST2) # finds the year element down to the book element with the Category property value of sport element rst3 = Xml.xpath ('//book[@category = "Sport"]/year ') rst3 = Rst3[0]print ('-------------\ n ', type (RST3)) print (Rst3.tag) print (Rst3.text)

Run results

Etree and XPath working with results

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.