Read and Write XML files using Python

Source: Internet
Author: User
Read and Write XML in Python
(Original converted from http://zc0604.iteye.com/blog/1520703) <a python read XML File> converted from external

When using python for development, because Python's open-source ecosystem is very powerful, many class libraries are often used to implement the same function, developers are also faced with how to choose the best class library as a tool to assist development. This article will record the class libraries I tested when using python to process XML format data. Some Class Libraries cannot support some features due to inherent limitations. The class libraries or modules involved include XML (Python built-in), libxml2, lxml, and XPath.

Note: The structure of the data processed in XML format in this document is as follows:

Python code

Input_xml_string = """

<Root>

<Item>

<Data version = "1.0" url = "http: // ***"/>

<Data version = "2.0" url = "http: // ***"/>

</Item>

<Other>

<Data version = "1.0" url = "http: // ***"/>

<Data version = "2.0" url = "http: // ***"/>

</Other>

</Root>

"""

XML processing module of Python

You can use the "getelementsbytagname" interface provided by this module to find the desired node. The instance "get_tagname" is as follows:

Python code

Import XML. Dom. minidom

Def get_tagname ():

Doc = xml. Dom. minidom. parsestring (input_xml_string)

For node in Doc. getelementsbytagname ("data "):

Print (node, node. tagname, node. getattribute ("version "))

The program running result is as follows:

Python code

(<Dom element: data at 0x89884cc>, u 'data', u '1. 0 ')

(<Dom element: data at 0x898860c>, u 'data', u '2. 0 ')

(<Dom element: data at 0x89887cc>, u 'data', u '1. 0 ')

(<Dom element: data at 0x898890c>, u 'data', u '2. 0 ')

Observe the preceding running result. The "getelementsbytagname" interface is used to search for all nodes named data. Sometimes, the function required by the program is to only use the data nodes under a node, such as the data node under the other node. You may think about it immediately. We can determine whether the parent node of the data node is other to meet the function. The instance "get_tagname_other" is as follows:

Python code

Import XML. Dom. minidom

Def get_tagname_other ():

Doc = xml. Dom. minidom. parsestring (input_xml_string)

For node in Doc. getelementsbytagname ("data "):

If node. parentnode. tagname = "other ":

Print (node, node. tagname, node. getattribute ("version "))

The program running result is as follows:

Python code

(<Dom element: data at 0x936b7cc>, u 'data', u '1. 0 ')

(<Dom element: data at 0x936b90c>, u 'data', u '2. 0 ')

Observe the above running results. Well, the problem is solved. But if I want to find the data node under the other node and the attribute node version is 1.0, then we need to add more policies to filter out the data we need. Obviously, this method is not flexible enough. Therefore, we thought of using XPath to search for the nodes we need. The instance "get_xpath" is as follows:

Python code

Import XML. etree. elementtree

From stringio import stringio

File = stringio (input_xml_string)

Def get_xpath ():

Doc = xml. etree. elementtree. parse (file)

For node in Doc. findall ("// item/Data "):

Print (node, node. Tag, (node. Items ()))

The program running result is as follows:

Python code

(<Element data at 90c4dcc>, 'data', [('url', 'HTTP: // *** '), ('version', '1. 0 ')])

(<Element data at 90c4e8c>, 'data', [('url', 'HTTP: // *** '), ('version', '2. 0 ')])

Looking at the above running results, the use of xpath obviously improves the readability of the program, but it still does not solve the problem above. This is due to the inherent lack of support for the XPath method by the XML module of Python, to ensure both readability and functional correctness, we need to use a third-party XML processing class library for python.

Libxml2

Libxml2 is an XML Parser developed in C language. It is a free open-source software based on MIT license. Multiple programming languages are implemented based on it. This section describes the lxml module. The instance "get_xpath_1" is as follows:

Python code

Mport libxml2

Def get_xpath_1 ():

Doc = libxml2.parsefile ("data. xml") # The structure of the Data. xml file is the same as that of input_xml_string.

For node in Doc. xpatheval ("// item/data [@ version = '1. 0']"):

Print (node, node. Name, (node. properties. Name, node. properties. Content ))

Doc. freedoc ()

The program running result is as follows:

Python code

(<Xmlnode (data) object at 0x9326c6c>, 'data', ('version', '1. 0 '))

Observe the above running results to meet our needs. The "xpatheval ()" interface does not support the usage of similar templates, but does not affect the use. Because libxml2 is developed in C language, therefore, the method of using the API interface will inevitably be somewhat "unacceptable" (written or habitually used)

Lxml

Lxml is developed using the Python language based on libxml2 introduced above. It is more suitable for python developers than libxml2 ), the "XPath" interface supports template-like usage. The example "get_xpath_2" is as follows:

Python code

Import lxml. etree

Def get_xpath_2 ():

Doc = lxml. etree. parse (file)

For node in Doc. XPath ("// item/data [@ version = $ name]", name = "1.0 "):

Print (node, node. Tag, (node. Items ()))

The program running result is as follows:

Python code

(<Element data at a1f784c>, 'data', [('version', '1. 0 '), ('url', 'HTTP: // ***')])

Xpath

XPath is a officially recommended Python module that supports processing such as XPath. It is extended based on the python built-in XML processing module introduced in this article and can be used in combination, the "find" interface also supports usage similar to the template. The example "get_xpath_3" is as follows:

Python code

Import xpath

Def get_xpath_3 ():

Doc = xml. Dom. minidom. parsestring (input_xml_string)

For node in XPath. Find ("// item/data [@ version = $ name]", Doc, name = "1.0 "):

Print (node, node. tagname, node. getattribute ("version "))

The program running result is as follows:

Python code

(<Dom element: data at 0x89934cc>, u 'data', u '1. 0 ')

Summary

Through the practices of these class libraries, we have learned that python has various options when processing XML format data, we also learned that these class libraries are good at processing those aspects and usage methods of various class libraries. We can select appropriate class libraries based on actual needs to complete the development work.

<II. Writing XML files in Python> conversion from

Http://lulinbest.blog.sohu.com/75921823.html

I used to write a program to generate an XML file using minidom in Python. Now I need to read the content in the XML file. The first thing I think of is the mini Dom module. after writing and testing, I learned how to use the functions as expected, which is no different from Dom operations in Ajax.

We have known that elementtree is widely welcomed by Python programmers when processing XML files. It has also installed the installation package of elementtree. Now it is included in python2.5. now that I want to process XML files, I also need to learn to use more efficient and easy-to-use modules. I tried it for a long time. Except for the functions related to the namespace, I tried other functions. you can process XML files in the future.

The following is a simple example to show how to use each function:

from xml.etree.ElementTree import ElementTreefrom xml.etree.ElementTree import Elementfrom xml.etree.ElementTree import SubElementfrom xml.etree.ElementTree import dumpfrom xml.etree.ElementTree import Commentfrom xml.etree.ElementTree import tostring'''<?xml version="1.0"?><PurchaseOrder>  <account refnum="2390094"/>  <item sku="33-993933" qty="4">    <name>Potato Smasher</name>    <description>Smash Potatoes like never before.</description>  </item></PurchaseOrder>'''## Writing the content to xml documentbook = ElementTree()purchaseorder = Element('PurchaseOrder')book._setroot(purchaseorder)SubElement(purchaseorder,  'account', {'refnum' : "2390094"})item = Element("item", {'sku' : '33-993933', 'qty' : '4'})purchaseorder.append(item)print item.items()       # [('sku', '33-993933'), ('qty', '4')]print item.attrib        # {'sku': '33-993933', 'qty': '4'}print item.get('sku')    # 33-993933SubElement(item, 'name').text = "Potato Smasher"SubElement(item, 'description').text = "Smash Potatoes like never before."#book.write('book.xml',"utf-8")#print tostring(purchaseorder)#import sys#book.write(sys.stdout)#dump(book)## Displaying the content of the xml documentprint purchaseorder.find('account')print purchaseorder.find('account').get('refnum')print purchaseorder.findall('account')[0].get('refnum')print purchaseorder.find('item/name')print purchaseorder.find('item/name').text## How to use ElementTree([element,] )## 1. From standard XML element, it becomes root elementprint ElementTree(item).getroot().find('name').text## 2. From XML fileprint ElementTree(file='book.xml').getroot().find('item/description').text## Create an iteratorfor element in purchaseorder.getiterator():    print element.tag## Get pretty lookdef indent(elem, level=0):    i = "\n" + level*"  "    if len(elem):        if not elem.text or not elem.text.strip():            elem.text = i + "  "        for e in elem:            indent(e, level+1)        if not e.tail or not e.tail.strip():            e.tail = i    if level and (not elem.tail or not elem.tail.strip()):        elem.tail = i    return elemif __name__=="__main__":    dump(indent(purchaseorder))    book.write('book.xml',"utf-8")

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.