With Python development, as Python's open source ecosystem is very powerful, there is often a lot of class libraries in this area to implement the same functionality, and developers are also faced with the option of choosing the best class Library as a helper development tool. This article will record the class libraries that people have tested when using Python to process XML-formatted data, some of which are inherently inadequate to support features, and the class libraries or modules involved are XML (Python-band), LIBXML2, lxml, and XPath.
Note: This article deals with the structure of XML-formatted data as follows:
input_xml_string = "" "
<root>
<item>
<data version=" 1.0 "url=" http://*** "/>
< Data version= "2.0" url= "http://***"/>
</item>
<other>
<data version= "1.0" url= "http ://*** "/> <data version=" 2.0 "url=" http://*** "/>
</other>
</root>
" ""
Python's self-contained XML processing module
You can use the "getElementsByTagName" interface provided by this module to find the desired node, and the instance "Get_tagname" is as follows:
Import Xml.dom.minidom
def get_tagname ():
doc = xml.dom.minidom.parseString (input_xml_string) for
node In Doc.getelementsbytagname ("Data"):
Print (node, node.tagname, Node.getattribute ("version"))
The results of the program operation are as follows:
(<dom element:data at 0x89884cc>, u ' data ', U ' 1.0 ')
(<dom element:data at 0x898860c>, u ' data ', U ' 2.0 ')
(<dom element:data at 0x89887cc>, u ' data ', U ' 1.0 ')
(<dom element:data at 0x898890c>, u ' data ', U ' 2.0 ')
Looking at the results above, the "getElementsByTagName" interface looks for all nodes named data, and sometimes the program needs to complete only the data node below a node, such as the data node under the other node. Perhaps you immediately thought, we can determine whether the data node's parent node is other to satisfy the function, the instance "Get_tagname_other" is as follows:
Import Xml.dom.minidom
def get_tagname_other ():
doc = xml.dom.minidom.parseString (input_xml_string)
For node in doc.getelementsbytagname ("Data"):
if Node.parentNode.tagName = = "Other":
Print (node, node.tagname , Node.getattribute ("version"))
The results of the program operation are as follows:
(<dom element:data at 0x936b7cc>, u ' data ', U ' 1.0 ')
(<dom element:data at 0x936b90c>, u ' data ', U ' 2.0 ')
Look at the results of the above, well, well, the problem is solved, but if I want to look up the data node under the other node and the node version equals 1.0, then we need to add more strategies to filter out what we need, which is obviously not flexible enough, So we thought of using XPath to search for the nodes we need. The instance "Get_xpath" is as follows:
Import Xml.etree.ElementTree from
stringio import stringio
file = Stringio (input_xml_string)
def get_ XPath ():
doc = xml.etree.ElementTree.parse (file)
for node in Doc.findall ("//item/data"):
Print (node, Node.tag, (Node.items ()))
The results of the program operation are as follows:
(<element data at 90c4dcc>, ' Data ', [(' url ', ' http://*** '), (' Version ', ' 1.0 ')]
(<element data at 90c4e8c>, ' Data ', [(' url ', ' http://*** '), (' Version ', ' 2.0 ')]
Looking at the results above, the XPath approach obviously improves the readability of the program, but it still doesn't solve the problem, because Python's XML module has an inherent lack of support for XPath, and if you want to satisfy both readability and function correctness, We need to use a third party XML processing class library for Python.
Libxml2
LIBXML2 is an XML parser developed using the C language, a free open source software based on MIT license, which has a variety of programming languages based on its implementation, such as the lxml module that this article will introduce. The instance "Get_xpath_1" is as follows:
Import LIBXML2
def get_xpath_1 ():
doc = Libxml2.parsefile ("Data.xml") #data. xml file structure is the same as the input_xml_string above For
node in Doc.xpatheval ("//item/data[@version = ' 1.0 ']"):
Print (node, node.name, (Node.properties.name, node.properties.content))
Doc.freedoc ()
The results of the program operation are as follows:
(<xmlnode (data) object at 0x9326c6c>, ' Data ', (' Version ', ' 1.0 '))
Observation of the above operating results, to meet our needs, a little less than the "Xpatheval ()" Interface does not support the use of similar templates, but does not affect usage, because the LIBXML2 adopted the C language development, so in the use of API interface will inevitably a little "acclimatized" (writing or habitual usage )
lxml
lxml is developed using the Python language based on the LIBXML2 described above, and is more suitable for Python developers (I feel) than libxml2 in terms of usage, and the XPath interface supports the use of similar templates, instance "get_xpath_2" as follows:
Import lxml.etree
def get_xpath_2 ():
doc = lxml.etree.parse (file) for
node in Doc.xpath ("//item/data[@ Version = $name] "", Name = "1.0"):
Print (node, Node.tag, (Node.items ()))
The results of the program operation are as follows:
(<element data at A1f784c>, ' Data ', [(' Version ', ' 1.0 '), (' url ', ' http://*** ')]
Xpath
XPath is an official Python recommendation to support XPath processing modules, based on the Python XML processing module introduced in this article is extended, can be a good combination, while the "Find" interface also supports the use of similar templates, instance "Get_xpath_3" As follows:
Import XPath
def get_xpath_3 ():
doc = xml.dom.minidom.parseString (input_xml_string) for
node in Xpath.find ("//item/data[@version = $name]", doc, name = "1.0"):
Print (node, node.tagname, Node.getattribute (" Version "))
The results of the program operation are as follows:
(<dom element:data at 0x89934cc>, u ' data ', U ' 1.0 ')
Summarize
Through the practice of these class libraries, we have learned that Python has a variety of choices in processing XML-formatted data, and that these libraries are good at dealing with those aspects and using the various class libraries, and can choose the right class library to complete the development work according to the actual requirements.
Come from http://hi.baidu.com/heelenyc/blog/item/4062fd0b57c75294d1581b09.html