Python processing Xml__python

Source: Internet
Author: User
Tags tagname xml parser xpath

With Python development, as Python's open source ecosystem is very powerful, there is often a lot of class libraries in this area to implement the same functionality, and developers are also faced with the option of choosing the best class Library as a helper development tool. This article will record the class libraries that people have tested when using Python to process XML-formatted data, some of which are inherently inadequate to support features, and the class libraries or modules involved are XML (Python-band), LIBXML2, lxml, and XPath.

Note: This article deals with the structure of XML-formatted data as follows:

input_xml_string = "" " 
                   <root> 
                        <item> 
                            <data version=" 1.0 "url=" http://*** "/> 
                            < Data version= "2.0" url= "http://***"/> 
                         </item> 
                         <other> 
                             <data version= "1.0" url= "http ://*** "/> <data version=" 2.0 "url=" http://*** "/> 
                          </other> 
                     </root> 
                     " ""  

Python's self-contained XML processing module

You can use the "getElementsByTagName" interface provided by this module to find the desired node, and the instance "Get_tagname" is as follows:

Import Xml.dom.minidom  
def get_tagname ():  
    doc = xml.dom.minidom.parseString (input_xml_string) for  
    node In Doc.getelementsbytagname ("Data"):  
        Print (node, node.tagname, Node.getattribute ("version"))  

The results of the program operation are as follows:

(<dom element:data at 0x89884cc>, u ' data ', U ' 1.0 ')

(<dom element:data at 0x898860c>, u ' data ', U ' 2.0 ')

(<dom element:data at 0x89887cc>, u ' data ', U ' 1.0 ')

(<dom element:data at 0x898890c>, u ' data ', U ' 2.0 ')

Looking at the results above, the "getElementsByTagName" interface looks for all nodes named data, and sometimes the program needs to complete only the data node below a node, such as the data node under the other node. Perhaps you immediately thought, we can determine whether the data node's parent node is other to satisfy the function, the instance "Get_tagname_other" is as follows:

Import Xml.dom.minidom  
def get_tagname_other ():  
    doc = xml.dom.minidom.parseString (input_xml_string)  
    For node in doc.getelementsbytagname ("Data"):  
        if Node.parentNode.tagName = = "Other":  
            Print (node, node.tagname , Node.getattribute ("version"))  
The results of the program operation are as follows:

(<dom element:data at 0x936b7cc>, u ' data ', U ' 1.0 ')

(<dom element:data at 0x936b90c>, u ' data ', U ' 2.0 ')

Look at the results of the above, well, well, the problem is solved, but if I want to look up the data node under the other node and the node version equals 1.0, then we need to add more strategies to filter out what we need, which is obviously not flexible enough, So we thought of using XPath to search for the nodes we need. The instance "Get_xpath" is as follows:

Import Xml.etree.ElementTree from  
stringio import stringio  
file = Stringio (input_xml_string)  
def get_ XPath ():  
    doc = xml.etree.ElementTree.parse (file)  
    for node in Doc.findall ("//item/data"):  
        Print (node, Node.tag, (Node.items ()))  
The results of the program operation are as follows:

(<element data at 90c4dcc>, ' Data ', [(' url ', ' http://*** '), (' Version ', ' 1.0 ')]

(<element data at 90c4e8c>, ' Data ', [(' url ', ' http://*** '), (' Version ', ' 2.0 ')]

Looking at the results above, the XPath approach obviously improves the readability of the program, but it still doesn't solve the problem, because Python's XML module has an inherent lack of support for XPath, and if you want to satisfy both readability and function correctness, We need to use a third party XML processing class library for Python.

Libxml2

LIBXML2 is an XML parser developed using the C language, a free open source software based on MIT license, which has a variety of programming languages based on its implementation, such as the lxml module that this article will introduce. The instance "Get_xpath_1" is as follows:

Import LIBXML2  
def get_xpath_1 ():  
    doc = Libxml2.parsefile ("Data.xml") #data. xml file structure is the same as the input_xml_string above For  
    node in Doc.xpatheval ("//item/data[@version = ' 1.0 ']"):  
        Print (node, node.name, (Node.properties.name, node.properties.content))  
    Doc.freedoc ()  
The results of the program operation are as follows:

(<xmlnode (data) object at 0x9326c6c>, ' Data ', (' Version ', ' 1.0 '))

Observation of the above operating results, to meet our needs, a little less than the "Xpatheval ()" Interface does not support the use of similar templates, but does not affect usage, because the LIBXML2 adopted the C language development, so in the use of API interface will inevitably a little "acclimatized" (writing or habitual usage )

lxml

lxml is developed using the Python language based on the LIBXML2 described above, and is more suitable for Python developers (I feel) than libxml2 in terms of usage, and the XPath interface supports the use of similar templates, instance "get_xpath_2" as follows:

Import lxml.etree  
def get_xpath_2 ():  
   doc = lxml.etree.parse (file) for  
   node in Doc.xpath ("//item/data[@ Version = $name] "", Name = "1.0"):  
       Print (node, Node.tag, (Node.items ()))  
The results of the program operation are as follows:

(<element data at A1f784c>, ' Data ', [(' Version ', ' 1.0 '), (' url ', ' http://*** ')]

Xpath

XPath is an official Python recommendation to support XPath processing modules, based on the Python XML processing module introduced in this article is extended, can be a good combination, while the "Find" interface also supports the use of similar templates, instance "Get_xpath_3" As follows:

Import XPath  
def get_xpath_3 ():  
   doc = xml.dom.minidom.parseString (input_xml_string) for  
   node in Xpath.find ("//item/data[@version = $name]", doc, name = "1.0"):  
       Print (node, node.tagname, Node.getattribute (" Version "))  
The results of the program operation are as follows:

(<dom element:data at 0x89934cc>, u ' data ', U ' 1.0 ')

Summarize

Through the practice of these class libraries, we have learned that Python has a variety of choices in processing XML-formatted data, and that these libraries are good at dealing with those aspects and using the various class libraries, and can choose the right class library to complete the development work according to the actual requirements.


Come from http://hi.baidu.com/heelenyc/blog/item/4062fd0b57c75294d1581b09.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.